CN111401083A - Name identification method and device, storage medium and processor - Google Patents

Name identification method and device, storage medium and processor Download PDF

Info

Publication number
CN111401083A
CN111401083A CN201910002379.2A CN201910002379A CN111401083A CN 111401083 A CN111401083 A CN 111401083A CN 201910002379 A CN201910002379 A CN 201910002379A CN 111401083 A CN111401083 A CN 111401083A
Authority
CN
China
Prior art keywords
name
word segmentation
user
database
identifying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910002379.2A
Other languages
Chinese (zh)
Other versions
CN111401083B (en
Inventor
徐光伟
李辰
包祖贻
刘恒友
李林琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910002379.2A priority Critical patent/CN111401083B/en
Publication of CN111401083A publication Critical patent/CN111401083A/en
Application granted granted Critical
Publication of CN111401083B publication Critical patent/CN111401083B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a name identification method and device, a storage medium and a processor. Wherein, the method comprises the following steps: acquiring interactive texts in the communication process of a plurality of users; acquiring a call database, wherein the call database stores user names of at least one user group; and identifying the user name in the interactive text based on the name database and the universal name database, wherein the universal name database stores a plurality of universal user names. The invention solves the technical problem of low accuracy of name identification caused by poor processing effect on the unique name in real-time communication.

Description

Name identification method and device, storage medium and processor
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a name identification method and apparatus, a storage medium, and a processor.
Background
The method comprises the steps of firstly, translating Chinese names into different names, translating the Chinese names into L iu always space name today, translating into Liu nationality name today, representing the Liu nationality name today, recognizing Chinese names in advance, directly translating the recognized Chinese names into a preposed module according to pinyin or corresponding conversion rules, and identifying the Chinese names in advance, wherein the Chinese names are important basic Chinese name recognition in natural languages, the Chinese name recognition is an important basic Chinese name recognition task in the natural languages, the communication obstacles among people who speak different languages can be relieved, the Chinese-English (Chinese-to-English) translation application is the most extensive, the machine translation based on machine learning at present is successfully applied to a plurality of translation scenes, and the real-time communication translation is also used by the technology.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a name identification method and device, a storage medium and a processor, which at least solve the technical problem of low accuracy of name identification caused by poor processing effect on a specific name in real-time communication.
According to an aspect of an embodiment of the present invention, there is provided a name identification method, including: acquiring interactive texts in the communication process of a plurality of users; obtaining a call database, wherein the call database stores user names of at least one user group; and identifying the user name in the interactive text based on a universal name database and the name database, wherein the universal name database stores a plurality of universal user names.
Further, based on the generic call database and the call database, identifying the user name in the interactive text comprises: performing word segmentation processing on the interactive text based on the call database and the general call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; and identifying the user name in the interactive text from the at least one candidate user name in combination with the context information of the interactive text.
Further, based on the universal call database and the call database, performing word segmentation processing on the interactive text to obtain word segmentation results, including: performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the general call database to obtain a first word segmentation result set; performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and selecting a word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as a word segmentation result of the interactive text.
Further, identifying a user name in the interaction text from the at least one candidate user name in conjunction with the context information of the interaction text comprises: and identifying the at least one candidate user name by adopting a classification model combined with context characteristics so as to determine whether the at least one candidate user name is a user name in the interactive text.
Further, the classification model includes a convolutional neural network model.
Further, after identifying the user name in the interaction text, the method further comprises: translating the user name in the identified interactive text by adopting a user name rule to obtain a first translation result; translating the texts except the identified user name in the interactive texts by adopting a machine translation model to obtain a second translation result; and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
Further, obtaining the call database includes: if the plurality of users belong to the same user group, acquiring a call database corresponding to the user group; if the plurality of users belong to a plurality of user groups, acquiring a call database corresponding to each user group to obtain a plurality of call databases, and taking the plurality of call databases as the call databases of at least one user group.
According to another aspect of the embodiments of the present invention, there is also provided a name identifying apparatus, including: the first acquisition unit is used for acquiring interactive texts in the communication process of a plurality of users; a second obtaining unit, configured to obtain a call database, where the call database stores user names of at least one user group; and the identification unit is used for identifying the user name in the interactive text based on a universal name database and the name database, wherein the universal name database stores a plurality of universal user names.
Further, the identification unit includes: the processing module is used for performing word segmentation processing on the interactive text according to the call database and the general call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; and the identification module is used for identifying the user name in the interactive text from the at least one candidate user name by combining the context information of the interactive text.
Further, the identification module includes: the first processing submodule is used for performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the general name database to obtain a first word segmentation result set; the second processing submodule is used for performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and the determining submodule is used for selecting a word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as a word segmentation result of the interactive text.
Further, the identification module includes: and the identification submodule is used for identifying the at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is the user name in the interactive text.
Further, the classification model includes a convolutional neural network model.
Further, the apparatus further comprises: the first translation unit is used for translating the user name in the identified interactive text by adopting a user name rule after the user name in the interactive text is identified to obtain a first translation result; the second translation unit is used for translating the texts except the identified user name in the interactive texts by adopting a machine translation model to obtain a second translation result; and the determining unit is used for obtaining the translation result of the interactive text in the communication process according to the first translation result and the second translation result.
Further, the second acquisition unit includes: the first acquisition module is used for acquiring a call database corresponding to the user group under the condition that the plurality of users belong to the same user group; a second obtaining module, configured to, under a condition that the multiple users belong to multiple user groups, obtain a call database corresponding to each user group to obtain multiple call databases, and use the multiple call databases as a call database of the at least one user group.
According to an aspect of the embodiments of the present invention, there is provided a storage medium including a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to execute the method for identifying a name as described in any one of the above items.
According to an aspect of the embodiments of the present invention, there is provided a processor, configured to execute a program, where the program executes to perform the name identification method described in any one of the above.
In the embodiment of the invention, the interactive text in the communication process is acquired by combining the call database and the universal call database in the communication process of a plurality of users; acquiring a call database, wherein the call database stores user names of at least one user group; the user names in the interactive texts are identified based on the name database and the universal name database, wherein the universal name database stores a plurality of universal user names, and the purpose of accurately identifying the user names is achieved, so that the technical effect of improving the identification accuracy rate of the unique names in the real-time communication process is achieved, and the technical problem that the identification accuracy rate of the names is low due to the fact that the unique names are poor in processing effect in the real-time communication process is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware configuration of a computer terminal according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of identifying a name according to an embodiment of the invention;
FIG. 3 is a diagram illustrating a method for identifying a name according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a name identification device according to an embodiment of the invention; and
fig. 5 is a block diagram of an alternative computer terminal according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
and (3) machine translation: machine intelligent translation using machine learning algorithms.
And (3) supervised learning: the machine learning model is trained using a data set with manual annotations.
CNN: a convolutional neural network.
RNN: a recurrent neural network.
Self-orientation: the self-attention mechanism, a neural network model.
End-to-end neural networks: the method is also called sentence-to-sentence, is a network structure based on sentence learning, and is widely used for language models and machine translation.
Example 1
There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method for name recognition, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for a name recognition method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).
Under the above operating environment, the present application provides a method of identifying a name as shown in fig. 2. Fig. 2 is a flowchart of a name identification method according to an embodiment of the present invention.
Step 201, in the process of communication among a plurality of users, acquiring an interactive text in the communication process.
For example, in the communication process between the user a and the user B, the interactive text in the communication between the user a and the user B is acquired.
It should be noted that the communication performed by the user may be instant communication.
Step 202, a call database is obtained, wherein the call database stores user names of at least one user group.
The user group may be a user group within an enterprise. For example, for business A, there is a flower name for each employee, e.g., employee A is blue Ni and employee B is orange. The name database may include the flower names of all employees in an enterprise.
And 203, identifying the user name in the interactive text based on a universal name database and a name database, wherein the universal name database stores a plurality of universal user names.
All common user names in the history data stored in the above-mentioned generic name database, for example, xiaoming, xiaohong, father, mother, girl, and the like.
And identifying the user name in the real-time communication process of the user based on the call database and the universal call database.
For example, the interactive text includes: the name of flying is contained in the name database, so that the name of flying in the interactive text, namely 'flying' in 'flying late today' is identified as the name of the user.
Optionally, in the name identification method provided in the embodiment of the present application, identifying the user name in the interactive text based on the name database and the universal name database includes: performing word segmentation processing on the interactive text based on the call database and the general call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; and identifying the user name in the interactive text from the at least one candidate user name in combination with the context information of the interactive text.
For example, the interactive text of multiple users is "flying late today", and the word segmentation is performed on the interactive text, so that: and today, flying, late arriving, word segmentation results, wherein the candidate user name in the word segmentation results is called 'flying', and the 'flying' is determined as the user name in the interactive text by combining the context information of the interactive text.
Through the steps, the method of combining the call database and the universal call database is adopted, and the interactive text in the communication process is obtained in the communication process of a plurality of users; acquiring a call database, wherein the call database stores user names of at least one user group; the user names in the interactive texts are identified based on the name database and the universal name database, wherein the universal name database stores a plurality of universal user names, and the purpose of accurately identifying the user names is achieved, so that the technical effect of improving the identification accuracy rate of the unique names in the real-time communication process is achieved, and the technical problem that the identification accuracy rate of the names is low due to the fact that the unique names are poor in processing effect in the real-time communication process is solved.
Optionally, in the name identification method provided in the embodiment of the present application, based on the name database and the general name database, performing a word segmentation process on the interactive text, and obtaining a word segmentation result includes: performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the general call database to obtain a first word segmentation result set; performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and selecting a word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as a word segmentation result of the interactive text.
In the scheme, possible segmentation paths are listed by using a forward maximum matching algorithm for the text according to the general name database, and a first segmentation result set is obtained. And listing out possible segmentation paths by utilizing a forward maximum matching algorithm on the text according to the name database to obtain a second word segmentation result set. And selecting one path with the highest probability from all the candidate segmentation paths (namely the first segmentation result set and the second segmentation result set) by adopting a segmentation model as a segmentation result (corresponding to the segmentation result).
Optionally, in the name identification method provided in the embodiment of the present application, identifying, from the at least one candidate user name, a user name in the interaction text in combination with the context information of the interaction text includes: and identifying the at least one candidate user name by adopting a classification model combined with context characteristics so as to determine whether the at least one candidate user name is a user name in the interactive text.
In the above scheme, the at least one candidate user name may be identified using a CNN (convolutional neural network) classifier in combination with a context feature to determine whether the at least one candidate user name is a user name in the interactive text.
It should be noted that the above-mentioned CNN (convolutional neural network) or RNN (recurrent neural network) classifier that combines the context features is a binary classification model trained on the human nouns through supervised name recognition training data, and the model can generalize the context environment when the human nouns appear, thereby being effective in distinguishing the customized human name candidate words.
Optionally, in the name identification method provided in the embodiment of the present application, after identifying the user name in the interactive text, the method further includes: translating the user name in the identified interactive text by adopting a user name rule to obtain a first translation result; translating the texts except the identified user name in the interactive texts by adopting a machine translation model to obtain a second translation result; and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
In real-time communication among a plurality of users, it is necessary to translate communication contents, and after identifying a user name in an interactive text, translate the user name according to a user name rule, for example, the interactive text in the communication contents is "fly and say today leave on, wherein, the flight is identified as the user name, the flight in the interactive text is translated into FeiXiang, the day leave on, and translated into" oid today is a hold "in the interactive text, and the combination is: FeiXian said today is a holitray. And obtaining the translation result of the interactive text in the communication process. The scheme can be applied to English translation scenes in real-time communication, and the problem that the translation effect of the translation model on Chinese names (particularly customized names) in different user scenes is poor is solved.
It should be noted that, the name rule translation, for most of the Chinese names, can directly perform the "Chinese-English" translation according to the rule of converting Chinese characters into pinyin, and if the customized name dictionary contains the unique patterns of "Liu Ji", "Lidong", etc., the translation is performed according to the corresponding rule. The machine translation model can be an end-to-end neural network translation model based on Self-Attention and is formed by training large-scale Chinese-English parallel linguistic data.
Optionally, in the name identification method provided in the embodiment of the present application, acquiring the name database includes: if the plurality of users belong to the same user group, acquiring a call database corresponding to the user group; if the plurality of users belong to a plurality of user groups, acquiring a call database corresponding to each user group to obtain a plurality of call databases, and taking the plurality of call databases as the call databases of at least one user group.
For example, the user a and the user B communicate in real time, and the user a and the user B may or may not belong to the same user group. If the user A and the user B belong to the same enterprise, a customized call database containing the user A and the user B exists in the enterprise, a call database of the enterprise is obtained, and if the user A and the user B do not belong to the same user group, a customized call database A of the user group to which the user A belongs and a customized call database B of the user group to which the user B belongs are respectively obtained. And taking the customized name database A and the customized name database B as the customized name databases which are subsequently used for identifying the user name in the interactive text in the real-time communication process of the user A and the user B.
As shown in fig. 2, for a plurality of users to communicate with a text in real time, different enterprises may arrange an enterprise customized person name dictionary (corresponding to the above-mentioned name database) according to an internal flower name or alias system, and the customized person name recognition module may combine the customized person name dictionary to recognize the customized person name in the text.
And for the recognized Chinese name part, translating the recognized Chinese name part directly through rules of converting Chinese characters into pinyin and the like. The remaining text, excluding the recognized name of the Chinese character, is still translated by the normal machine translation model. And finally, combining the two translated parts according to the corresponding position of the text to obtain a final machine translation result.
In the embodiment of the application, a customized Chinese name recognition method is provided, so that different users can customize and customize personal name recognition models suitable for the users. And then, under a translation scene in real-time communication, the names of people can be accurately identified, so that the accuracy of text translation in the communication process is improved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
According to an embodiment of the present invention, there is also provided an apparatus for implementing the identification method of the name, as shown in fig. 3, the apparatus includes: a first acquisition unit 300, a second acquisition unit 301 and a recognition unit 302.
Specifically, the first obtaining unit 300 is configured to obtain an interactive text in a communication process during communication of multiple users;
a second obtaining unit 301, configured to obtain a call database, where the call database stores user names of at least one user group;
the identifying unit 302 is configured to identify a user name in the interactive text according to the name database and the universal name database, where the universal name database stores a plurality of universal user names.
In the name identification device provided by the embodiment of the application, the first obtaining unit 300 obtains the interactive text in the communication process of a plurality of users; the second acquiring unit 301 acquires a call database, wherein the call database stores user names of at least one user group; the identifying unit 302 identifies the user name in the interactive text according to the call database and the universal call database, wherein the universal call database stores a plurality of universal user names. The purpose of accurately identifying the user name is achieved, the technical effect of improving the identification accuracy rate of the unique name in the real-time communication process is achieved, and the technical problem that the identification accuracy rate of the name is low due to the fact that the unique name is poor in processing effect in real-time communication is solved.
Optionally, in the name identification apparatus provided in this embodiment of the present application, the identification unit 302 includes: the processing module is used for performing word segmentation processing on the interactive text according to the call database and the general call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; and the identification module is used for identifying the user name in the interactive text from at least one candidate user name by combining the context information of the interactive text.
Optionally, in the name identification apparatus provided in this embodiment of the present application, the identification module includes: the first processing submodule is used for performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the general name database to obtain a first word segmentation result set; the second processing submodule is used for performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and the determining submodule is used for selecting the word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as the word segmentation result of the interactive text.
Optionally, in the name identification apparatus provided in this embodiment of the present application, the identification module includes: and the identification submodule is used for identifying the at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is a user name in the interactive text.
Optionally, in the identification apparatus as named provided in this application, the classification model includes a convolutional neural network model.
Optionally, in the name identification apparatus provided in this embodiment of the present application, the apparatus further includes: the first translation unit is used for translating the user name in the identified interactive text by adopting a user name rule after the user name in the interactive text is identified to obtain a first translation result; the second translation unit is used for translating the texts except the identified user names in the interactive texts by adopting a machine translation model to obtain a second translation result; and the determining unit is used for obtaining the translation result of the interactive text in the communication process according to the first translation result and the second translation result.
Optionally, in the name identification apparatus provided in this embodiment of the present application, the second obtaining unit 301 includes: the first acquisition module is used for acquiring a call database corresponding to a user group under the condition that a plurality of users belong to the same user group; and the second acquisition module is used for acquiring the call database corresponding to each user group under the condition that the plurality of users belong to the plurality of user groups to obtain a plurality of call databases, and taking the plurality of call databases as the call databases of at least one user group.
It should be noted here that the first acquiring unit 300, the second acquiring unit 301 and the identifying unit 302 correspond to steps S201 to S203 in embodiment 1, and the two modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.
Example 3
The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.
Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.
In this embodiment, the computer terminal may execute the program code of the following steps in the method for identifying a name of an application: acquiring interactive texts in the communication process of a plurality of users; acquiring a call database, wherein the call database stores user names of at least one user group; and identifying the user name in the interactive text based on the name database and the universal name database, wherein the universal name database stores a plurality of universal user names.
The computer terminal may further execute program codes of the following steps in the method for identifying a name of an application: identifying the user name in the interactive text based on the call database and the universal call database includes: performing word segmentation processing on the interactive text based on the call database and the general call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; the user name in the interactive text is identified from the at least one candidate user name in conjunction with the context information of the interactive text.
The computer terminal may further execute program codes of the following steps in the method for identifying a name of an application: based on the call database and the general call database, performing word segmentation processing on the interactive text to obtain word segmentation results, wherein the word segmentation results comprise: performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the general name database to obtain a first word segmentation result set; performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and selecting the word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as the word segmentation result of the interactive text.
The computer terminal may further execute program codes of the following steps in the method for identifying a name of an application: identifying the user name in the interaction text from the at least one candidate user name in conjunction with the context information of the interaction text comprises: and identifying at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is a user name in the interactive text.
The computer terminal may further execute program codes of the following steps in the method for identifying a name of an application: the classification model includes a convolutional neural network model.
The computer terminal may further execute program codes of the following steps in the method for identifying a name of an application: after identifying the user name in the interaction text, the method further comprises: translating the user name in the identified interactive text by adopting a user name rule to obtain a first translation result; translating the texts except the identified user name in the interactive texts by adopting a machine translation model to obtain a second translation result; and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
The computer terminal may further execute program codes of the following steps in the method for identifying a name of an application: obtaining a call database comprises: if a plurality of users belong to the same user group, acquiring a call database corresponding to the user group; if a plurality of users belong to a plurality of user groups, acquiring a name database corresponding to each user group to obtain a plurality of name databases, and taking the plurality of name databases as the name database of at least one user group.
Alternatively, fig. 4 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 4, the computer terminal a may include: one or more processors (only one shown), memory.
The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the name identification method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by operating the software programs and modules stored in the memory, that is, implementing the above name identification. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring interactive texts in the communication process of a plurality of users; acquiring a call database, wherein the call database stores user names of at least one user group; and identifying the user name in the interactive text based on the name database and the universal name database, wherein the universal name database stores a plurality of universal user names.
The storage medium is further configured to store program code for performing the following steps, identifying a user name in the interactive text based on the call database and the generic call database, comprising: performing word segmentation processing on the interactive text based on the call database and the general call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; the user name in the interactive text is identified from the at least one candidate user name in conjunction with the context information of the interactive text.
The storage medium is further configured to store program code for performing the following steps, and perform a word segmentation process on the interactive text based on the call database and the general call database, and the obtaining a word segmentation result includes: performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the general name database to obtain a first word segmentation result set; performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and selecting the word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as the word segmentation result of the interactive text.
The storage medium is further configured to store program code for identifying a user name in the interaction text from the at least one candidate user name in conjunction with the context information of the interaction text, comprising: and identifying at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is a user name in the interactive text.
The storage medium is further configured to store a program code classification model for performing the following steps including a convolutional neural network model.
The storage medium is further configured to store program code for performing the following steps, after identifying the user name in the interactive text, the method further comprising: translating the user name in the identified interactive text by adopting a user name rule to obtain a first translation result; translating the texts except the identified user name in the interactive texts by adopting a machine translation model to obtain a second translation result; and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
The storage medium is further configured to store program code for performing the steps of obtaining a calling-name database comprising: if a plurality of users belong to the same user group, acquiring a call database corresponding to the user group; if a plurality of users belong to a plurality of user groups, acquiring a name database corresponding to each user group to obtain a plurality of name databases, and taking the plurality of name databases as the name database of at least one user group.
The embodiment of the invention provides a scheme of a name identification method. Acquiring an interactive text in the communication process by combining a call database and a general call database in the communication process of a plurality of users; acquiring a call database, wherein the call database stores user names of at least one user group; the user names in the interactive texts are identified based on the name database and the universal name database, wherein the universal name database stores a plurality of universal user names, and the purpose of accurately identifying the user names is achieved, so that the technical effect of improving the identification accuracy rate of the unique names in the real-time communication process is achieved, and the technical problem that the identification accuracy rate of the names is low due to the fact that the unique names are poor in processing effect in the real-time communication process is solved.
It can be understood by those skilled in the art that the structure shown in fig. 4 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 4 is a diagram illustrating the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 4, or have a different configuration than shown in FIG. 4.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
Example 4
The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the name identification method provided in the first embodiment.
Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring interactive texts in the communication process of a plurality of users; acquiring a call database, wherein the call database stores user names of at least one user group; and identifying the user name in the interactive text based on the name database and the universal name database, wherein the universal name database stores a plurality of universal user names.
The storage medium is further configured to store program code for performing the following steps, identifying a user name in the interactive text based on the call database and the generic call database, comprising: performing word segmentation processing on the interactive text based on the call database and the general call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; the user name in the interactive text is identified from the at least one candidate user name in conjunction with the context information of the interactive text.
The storage medium is further configured to store program code for performing the following steps, and perform a word segmentation process on the interactive text based on the call database and the general call database, and the obtaining a word segmentation result includes: performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the general name database to obtain a first word segmentation result set; performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and selecting the word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as the word segmentation result of the interactive text.
The storage medium is further configured to store program code for identifying a user name in the interaction text from the at least one candidate user name in conjunction with the context information of the interaction text, comprising: and identifying at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is a user name in the interactive text.
The storage medium is further configured to store a program code classification model for performing the following steps including a convolutional neural network model.
The storage medium is further configured to store program code for performing the following steps, after identifying the user name in the interactive text, the method further comprising: translating the user name in the identified interactive text by adopting a user name rule to obtain a first translation result; translating the texts except the identified user name in the interactive texts by adopting a machine translation model to obtain a second translation result; and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
The storage medium is further configured to store program code for performing the steps of obtaining a calling-name database comprising: if a plurality of users belong to the same user group, acquiring a call database corresponding to the user group; if a plurality of users belong to a plurality of user groups, acquiring a name database corresponding to each user group to obtain a plurality of name databases, and taking the plurality of name databases as the name database of at least one user group.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (16)

1. A method for identifying a name, comprising:
acquiring interactive texts in the communication process of a plurality of users;
obtaining a call database, wherein the call database stores user names of at least one user group;
and identifying the user name in the interactive text based on a universal name database and the name database, wherein the universal name database stores a plurality of universal user names.
2. The method for identifying names according to claim 1, wherein identifying the user name in the interactive text based on a generic name database and the name database comprises:
performing word segmentation processing on the interactive text based on a general call database and the call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name;
and identifying the user name in the interactive text from the at least one candidate user name in combination with the context information of the interactive text.
3. The name recognition method according to claim 2, wherein performing a word segmentation process on the interactive text based on a generic name database and the name database to obtain a word segmentation result comprises:
performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the general call database to obtain a first word segmentation result set;
performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set;
and selecting a word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as a word segmentation result of the interactive text.
4. The method for identifying names according to claim 2, wherein identifying the user name in the interaction text from the at least one candidate user name in combination with the context information of the interaction text comprises:
and identifying the at least one candidate user name by adopting a classification model combined with context characteristics so as to determine whether the at least one candidate user name is a user name in the interactive text.
5. The method of claim 4, wherein the classification model comprises a convolutional neural network model.
6. The method for identifying names according to claim 1, further comprising:
after the user name in the interactive text is identified, translating the identified user name in the interactive text by adopting a user name rule to obtain a first translation result;
translating the texts except the identified user name in the interactive texts by adopting a machine translation model to obtain a second translation result;
and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
7. The method for identifying names according to claim 1, wherein obtaining the name database comprises:
if the plurality of users belong to the same user group, acquiring a call database corresponding to the user group;
if the plurality of users belong to a plurality of user groups, acquiring a call database corresponding to each user group to obtain a plurality of call databases, and taking the plurality of call databases as the call databases of at least one user group.
8. An apparatus for recognizing a name, comprising:
the first acquisition unit is used for acquiring interactive texts in the communication process of a plurality of users;
a second obtaining unit, configured to obtain a call database, where the call database stores user names of at least one user group;
and the identification unit is used for identifying the user name in the interactive text according to a universal name database and the name database, wherein the universal name database stores a plurality of universal user names.
9. The apparatus for recognizing a name according to claim 8, wherein the recognizing unit includes:
the processing module is used for performing word segmentation processing on the interactive text according to the call database and the general call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name;
and the identification module is used for identifying the user name in the interactive text from the at least one candidate user name by combining the context information of the interactive text.
10. The apparatus for identifying names according to claim 9, wherein the identifying module comprises:
the first processing submodule is used for performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the general name database to obtain a first word segmentation result set;
the second processing submodule is used for performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set;
and the determining submodule is used for selecting a word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as a word segmentation result of the interactive text.
11. The apparatus for identifying names according to claim 9, wherein the identifying module comprises:
and the identification submodule is used for identifying the at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is the user name in the interactive text.
12. The apparatus for identifying names as recited in claim 11, wherein the classification model comprises a convolutional neural network model.
13. The apparatus for identifying names as set forth in claim 8, wherein the apparatus further comprises:
the first translation unit is used for translating the user name in the identified interactive text by adopting a user name rule after the user name in the interactive text is identified to obtain a first translation result;
the second translation unit is used for translating the texts except the identified user name in the interactive texts by adopting a machine translation model to obtain a second translation result;
and the determining unit is used for obtaining the translation result of the interactive text in the communication process according to the first translation result and the second translation result.
14. The apparatus for identifying a name according to claim 8, wherein the second obtaining unit includes:
the first acquisition module is used for acquiring a call database corresponding to the user group under the condition that the plurality of users belong to the same user group;
a second obtaining module, configured to, under a condition that the multiple users belong to multiple user groups, obtain a call database corresponding to each user group to obtain multiple call databases, and use the multiple call databases as a call database of the at least one user group.
15. A storage medium characterized by comprising a stored program, wherein a device in which the storage medium is located is controlled to execute the method for identifying a name according to any one of claims 1 to 7 when the program runs.
16. A processor, configured to execute a program, wherein the program executes the method for identifying a name according to any one of claims 1 to 7.
CN201910002379.2A 2019-01-02 2019-01-02 Name identification method and device, storage medium and processor Active CN111401083B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910002379.2A CN111401083B (en) 2019-01-02 2019-01-02 Name identification method and device, storage medium and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910002379.2A CN111401083B (en) 2019-01-02 2019-01-02 Name identification method and device, storage medium and processor

Publications (2)

Publication Number Publication Date
CN111401083A true CN111401083A (en) 2020-07-10
CN111401083B CN111401083B (en) 2023-05-02

Family

ID=71430188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910002379.2A Active CN111401083B (en) 2019-01-02 2019-01-02 Name identification method and device, storage medium and processor

Country Status (1)

Country Link
CN (1) CN111401083B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859970A (en) * 2020-07-23 2020-10-30 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for processing information

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070043567A1 (en) * 2005-08-22 2007-02-22 International Business Machines Corporation Techniques for aiding speech-to-speech translation
CN101093478A (en) * 2007-07-25 2007-12-26 中国科学院计算技术研究所 Method and system for identifying Chinese full name based on Chinese shortened form of entity
CN101101599A (en) * 2007-06-20 2008-01-09 精实万维软件(北京)有限公司 Method for extracting advertisement main information from web page
US20080010058A1 (en) * 2006-07-07 2008-01-10 Robert Bosch Corporation Method and apparatus for recognizing large list of proper names in spoken dialog systems
CN101727441A (en) * 2009-12-25 2010-06-09 北京工业大学 Evaluating method and evaluating system targeting Chinese name identifying system
CN103514165A (en) * 2012-06-15 2014-01-15 佳能株式会社 Method and device for identifying persons mentioned in conversation
CN103544139A (en) * 2012-07-13 2014-01-29 江苏新瑞峰信息科技有限公司 Forward word segmentation method and device based on Chinese retrieval
WO2014172428A2 (en) * 2013-04-19 2014-10-23 Google Inc. Name recognition
CN105320645A (en) * 2015-09-24 2016-02-10 天津海量信息技术有限公司 Recognition method for Chinese company name
CN105630765A (en) * 2015-12-21 2016-06-01 浙江万里学院 Place name address identifying method
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN106055560A (en) * 2016-05-18 2016-10-26 上海申腾信息技术有限公司 Method for collecting data of word segmentation dictionary based on statistical machine learning method
CN107844477A (en) * 2017-10-25 2018-03-27 西安影视数据评估中心有限公司 A kind of extracting method and device of this person names of movie and television play
CN108255806A (en) * 2017-12-22 2018-07-06 北京奇艺世纪科技有限公司 A kind of name recognition methods and device
CN108491373A (en) * 2018-02-01 2018-09-04 北京百度网讯科技有限公司 A kind of entity recognition method and system
CN108595435A (en) * 2018-05-03 2018-09-28 鹏元征信有限公司 A kind of organization names identifying processing method, intelligent terminal and storage medium
CN108694168A (en) * 2018-05-11 2018-10-23 深圳云之家网络有限公司 A kind of address processing method and processing device, computer installation and readable storage medium storing program for executing

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070043567A1 (en) * 2005-08-22 2007-02-22 International Business Machines Corporation Techniques for aiding speech-to-speech translation
US20080010058A1 (en) * 2006-07-07 2008-01-10 Robert Bosch Corporation Method and apparatus for recognizing large list of proper names in spoken dialog systems
CN101101599A (en) * 2007-06-20 2008-01-09 精实万维软件(北京)有限公司 Method for extracting advertisement main information from web page
CN101093478A (en) * 2007-07-25 2007-12-26 中国科学院计算技术研究所 Method and system for identifying Chinese full name based on Chinese shortened form of entity
CN101727441A (en) * 2009-12-25 2010-06-09 北京工业大学 Evaluating method and evaluating system targeting Chinese name identifying system
CN103514165A (en) * 2012-06-15 2014-01-15 佳能株式会社 Method and device for identifying persons mentioned in conversation
CN103544139A (en) * 2012-07-13 2014-01-29 江苏新瑞峰信息科技有限公司 Forward word segmentation method and device based on Chinese retrieval
WO2014172428A2 (en) * 2013-04-19 2014-10-23 Google Inc. Name recognition
CN105320645A (en) * 2015-09-24 2016-02-10 天津海量信息技术有限公司 Recognition method for Chinese company name
CN105630765A (en) * 2015-12-21 2016-06-01 浙江万里学院 Place name address identifying method
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN106055560A (en) * 2016-05-18 2016-10-26 上海申腾信息技术有限公司 Method for collecting data of word segmentation dictionary based on statistical machine learning method
CN107844477A (en) * 2017-10-25 2018-03-27 西安影视数据评估中心有限公司 A kind of extracting method and device of this person names of movie and television play
CN108255806A (en) * 2017-12-22 2018-07-06 北京奇艺世纪科技有限公司 A kind of name recognition methods and device
CN108491373A (en) * 2018-02-01 2018-09-04 北京百度网讯科技有限公司 A kind of entity recognition method and system
CN108595435A (en) * 2018-05-03 2018-09-28 鹏元征信有限公司 A kind of organization names identifying processing method, intelligent terminal and storage medium
CN108694168A (en) * 2018-05-11 2018-10-23 深圳云之家网络有限公司 A kind of address processing method and processing device, computer installation and readable storage medium storing program for executing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张小衡,王玲玲: "中文机构名称的识别与分析" *
贾品贵;杨一平;卢朋;: "基于统计方法的中文姓名识别研究" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859970A (en) * 2020-07-23 2020-10-30 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for processing information

Also Published As

Publication number Publication date
CN111401083B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
US11392838B2 (en) Method, equipment, computing device and computer-readable storage medium for knowledge extraction based on TextCNN
US20200301954A1 (en) Reply information obtaining method and apparatus
CN109670163B (en) Information identification method, information recommendation method, template construction method and computing device
CN108710647B (en) Data processing method and device for chat robot
CN111310440B (en) Text error correction method, device and system
CN106909270A (en) Chat data input method, device and communicating terminal
CN108319888B (en) Video type identification method and device and computer terminal
CN114757176A (en) Method for obtaining target intention recognition model and intention recognition method
CN106649294A (en) Training of classification models and method and device for recognizing subordinate clauses of classification models
CN111552767A (en) Search method, search device and computer equipment
CN112906361A (en) Text data labeling method and device, electronic equipment and storage medium
CN115858741A (en) Intelligent question answering method and device suitable for multiple scenes and storage medium
CN111401083B (en) Name identification method and device, storage medium and processor
CN110134920A (en) Draw the compatible display methods of text, device, terminal and computer readable storage medium
CN110929519B (en) Entity attribute extraction method and device
CN111274813B (en) Language sequence labeling method, device storage medium and computer equipment
CN111291561B (en) Text recognition method, device and system
CN112084766A (en) Text processing method and device, storage medium and processor
CN109683727A (en) A kind of data processing method and device
CN106855854A (en) A kind of recognition methods of english information and device
CN110956034B (en) Word acquisition method and device and commodity search method
CN111897990A (en) Method, device and system for acquiring expression information
CN110232328A (en) A kind of reference report analytic method, device and computer readable storage medium
CN111694962A (en) Data processing method and device
CN112732877B (en) Data processing method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant