CN113962198B - Method, device, equipment and medium for converting Chinese text - Google Patents

Method, device, equipment and medium for converting Chinese text Download PDF

Info

Publication number
CN113962198B
CN113962198B CN202111215392.XA CN202111215392A CN113962198B CN 113962198 B CN113962198 B CN 113962198B CN 202111215392 A CN202111215392 A CN 202111215392A CN 113962198 B CN113962198 B CN 113962198B
Authority
CN
China
Prior art keywords
text
root
matched
chinese
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111215392.XA
Other languages
Chinese (zh)
Other versions
CN113962198A (en
Inventor
贺光明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202111215392.XA priority Critical patent/CN113962198B/en
Publication of CN113962198A publication Critical patent/CN113962198A/en
Application granted granted Critical
Publication of CN113962198B publication Critical patent/CN113962198B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention relates to the field of artificial intelligence, and discloses a conversion method of Chinese text, which comprises the following steps: obtaining a Chinese text to be converted, performing preset matching operation on the text to be matched based on a preset root stock to obtain a matched root corresponding to the text to be matched and a tail text, adding the matched root into a preset root container, judging whether the tail text is null, determining the tail text as a new text to be matched when the tail text is not null, performing preset matching operation again, extracting all the matched roots from the root container when the tail text is null, and combining English roots corresponding to each matched root to obtain an English text corresponding to the Chinese text. Therefore, the invention can realize the conversion of the text through simple root matching, and improve the conversion efficiency and the conversion accuracy of the text conversion.

Description

Method, device, equipment and medium for converting Chinese text
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a method and apparatus for converting chinese text, a computer device, and a storage medium.
Background
In computer systems, the storage forms of data are often diverse (e.g., a part of data is stored in chinese form and another part of data is stored in english form), and in order to facilitate processing of data, data normalization processing is usually required for data in different forms, so that data is converted into the same form, and further processing of data is facilitated. In the data normalization process, the conversion of chinese text into corresponding english text is a common normalization process. Currently, a common implementation manner of converting chinese text into english text is through a chinese-english text conversion model based on artificial intelligence technology. Specifically, the Chinese-English text conversion model can understand the semantics of the Chinese text first, and then generates the corresponding English text according to the semantics of the Chinese text, so that Chinese-English conversion is realized. However, the conversion process in the chinese-english text conversion model is generally complex and has poor interpretability, and the accuracy of the converted english text is not high, for example, the same chinese text may generate an english text that is not identical in two conversions. In addition, in the process of performing Chinese-English conversion by using an English text conversion model, a large amount of data calculation is generally required, so that the text conversion efficiency is generally not high. Therefore, the conversion accuracy and conversion efficiency of the current conversion method of the Chinese text still have room for further improvement.
Disclosure of Invention
The invention aims to solve the technical problems that the conversion accuracy and conversion efficiency of the traditional Chinese text conversion method are lower.
In order to solve the technical problem, the first aspect of the present invention discloses a method for converting chinese text, the method comprising:
Obtaining a Chinese text to be converted;
Performing preset matching operation on a text to be matched based on a preset root stock to obtain a matched root and a tail text corresponding to the text to be matched, wherein the text to be matched is determined based on the Chinese text to be converted, the initial value of the text to be matched is the Chinese text, the root stock is pre-stored with a plurality of Chinese roots, each Chinese root is pre-set with a corresponding English root, and the tail text refers to a part of text of the text to be matched except the matched root;
adding the matched root word into a preset root word container;
Judging whether the tail text is a null value or not;
When judging that the tail text is not null, determining the tail text as a new text to be matched, and triggering and executing the step of executing preset matching operation on the text to be matched based on a preset root stock so as to obtain a matched root and tail text corresponding to the text to be matched;
When the tail text is judged to be null, extracting all the matched word roots from the word root container, and combining English word roots corresponding to each matched word root in the word root library according to the sequence of adding each matched word root into the word root container so as to obtain English text corresponding to the Chinese text.
The second aspect of the present invention discloses a conversion device for chinese text, the device comprising:
The acquisition module is used for acquiring the Chinese text to be converted;
The matching module is used for executing preset matching operation on a text to be matched based on a preset root stock to obtain a matching root and a tail text corresponding to the text to be matched, wherein the text to be matched is determined based on the Chinese text to be converted, the initial value of the text to be matched is the Chinese text, the root stock is pre-stored with a plurality of Chinese roots, each Chinese root is pre-set with a corresponding English root, and the tail text refers to a part of text of the text to be matched except the matching root;
the adding module is used for adding the matched root words into a preset root word container;
the judging module is used for judging whether the tail text is a null value or not;
The determining module is used for determining the tail text as a new text to be matched when the tail text is not a null value, and triggering and executing the step of executing preset matching operation on the text to be matched based on a preset root stock so as to obtain a matching root and tail text corresponding to the text to be matched;
And the combination module is used for extracting all the matched roots from the root container when the tail text is judged to be null, and combining English roots corresponding to each matched root in the root library according to the sequence of adding each matched root to the root container so as to obtain English text corresponding to the Chinese text.
A third aspect of the invention discloses a computer device comprising:
A memory storing executable program code;
a processor coupled to the memory;
the processor invokes the executable program code stored in the memory to perform some or all of the steps in the method for converting chinese text disclosed in the first aspect of the present invention.
A fourth aspect of the invention discloses a computer storage medium storing computer instructions which, when called, are adapted to perform part or all of the steps of the method of converting chinese text disclosed in the first aspect of the invention.
In the embodiment of the invention, a Chinese text to be converted is obtained, a preset matching operation is performed on the text to be matched based on a preset root stock to obtain a matched root and a tail text corresponding to the text to be matched, the matched root is added into a preset root container, whether the tail text is null or not is judged, when the tail text is judged not to be null, the tail text is determined to be a new text to be matched, the preset matching operation is performed again, when the tail text is judged to be null, all the matched roots are extracted from the root container, and English roots corresponding to all the matched roots are combined to obtain an English text corresponding to the Chinese text. Through the cyclic matching of the text to be matched, the text to be matched is completely divided into Chinese word roots in a word root library, and finally English word roots corresponding to the divided Chinese word roots are combined into English text, so that the conversion of the Chinese text into the English text can be realized through simple word root matching in the text conversion process, a text conversion model with complex calculation is avoided, the conversion efficiency of the text conversion can be improved, and in addition, the interpretation and stability of the conversion form based on word root matching are relatively higher, so that the conversion accuracy of the text conversion can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for converting Chinese text according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a device for converting Chinese text according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a computer device according to an embodiment of the present invention;
fig. 4 is a schematic structural view of a computer storage medium according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or article that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or article.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The invention discloses a conversion method, a device, computer equipment and a storage medium of Chinese texts, which are used for acquiring the Chinese texts to be converted, executing preset matching operation on the texts to be matched based on a preset root stock to obtain matched roots and tail texts corresponding to the texts to be matched, adding the matched roots into a preset root container, judging whether the tail texts are null values, determining the tail texts as new texts to be matched when the tail texts are not null values, executing the preset matching operation again, extracting all the matched roots from the root container when the tail texts are null values, and combining English roots corresponding to each matched root to obtain English texts corresponding to the Chinese texts. Through the cyclic matching of the text to be matched, the text to be matched is completely divided into Chinese word roots in a word root library, and finally English word roots corresponding to the divided Chinese word roots are combined into English text, so that the conversion of the Chinese text into the English text can be realized through simple word root matching in the text conversion process, a text conversion model with complex calculation is avoided, the conversion efficiency of the text conversion can be improved, and in addition, the interpretation and stability of the conversion form based on word root matching are relatively higher, so that the conversion accuracy of the text conversion can be improved. The following will describe in detail.
Example 1
Referring to fig. 1, fig. 1 is a flow chart of a method for converting chinese text according to an embodiment of the invention. As shown in fig. 1, the method of converting chinese text may include the operations of:
101. And obtaining the Chinese text to be converted.
In the step 101, when the user needs to convert a certain chinese text into an english text, the chinese text can be input in the interactive interface, so that the chinese text to be converted can be obtained. Alternatively, it is also possible to automatically detect the presence of the chinese text from the current computer system and then take the detected chinese text as the chinese text to be converted. For example, the acquired Chinese text to be converted is "the number of clients of the enterprise".
102. And executing preset matching operation on the text to be matched based on a preset root stock to obtain a matched root and a tail text corresponding to the text to be matched, wherein the text to be matched is determined based on the Chinese text to be converted, the initial value of the text to be matched is the Chinese text, the root stock is pre-stored with a plurality of Chinese roots, each Chinese root is pre-set with a corresponding English root, and the tail text refers to a part of text of the text to be matched except the matched root.
In the step 102, the chinese and english roots stored in the root stock may be as follows:
matching the text to be matched with the Chinese word roots in the word root library, so as to obtain the matched word roots and tail texts corresponding to the text to be matched, wherein the specific matching process is described in detail later. For example, when the text to be matched is "number of clients of enterprise", the matched root text and tail text may be "number of clients of enterprise" and "number of clients", respectively. The conversion method of the Chinese text in the embodiment of the invention is a cyclic matching process, so that the complete Chinese text can be used as an initial value of the text to be matched to start the cyclic matching process. For example, the initial text to be matched is set to complete Chinese text "Enterprise client quantity" to begin the loop.
103. And adding the matched root word into a preset root word container.
In step 103, after each time the matching of the text to be matched is completed, the matched root word may be added to the root word container for storage. The root container is understood to be a preset storage space for storing roots. For example, a matching root "business" of the text "business client number" to be matched may be added to the root container.
104. And judging whether the tail text is a null value or not.
In the step 104, in the process of cyclic matching, after each time matching of the text to be matched is completed, it may be determined whether the tail text is null, so as to determine whether the chinese text has been completely divided into chinese roots in the root word library, thereby determining whether the cycle of matching needs to be ended.
105. And when judging that the tail text is not null, determining the tail text as a new text to be matched, and triggering and executing the step of executing preset matching operation on the text to be matched based on a preset root stock so as to obtain a matching root and tail text corresponding to the text to be matched.
In the step 105, when the tail text is not null, it means that the chinese text is not completely divided into chinese roots in the root word library, so the tail text needs to be used as a new text to be matched, and then the new text to be matched is matched. If the tail text obtained by the next matching is still not null, the matching cycle is performed again, and so on until the tail text obtained by the last matching is null, namely, the Chinese text is completely divided into Chinese word roots in the word root library, and the matching cycle can be stopped, so that a complete matching cycle process is formed. If the chinese text to be converted is "number of clients of enterprise", the initial value of the text to be matched is "number of clients of enterprise". In this way, in the first matching, the texts to be matched are the 'enterprise client number', the matching root and the tail texts obtained by matching are the 'enterprise' and the 'client number', in the second matching, the matching root and the tail texts obtained by matching are the 'client' and the 'number', in the third matching, the matching root and the tail texts obtained by matching are the 'number', and the tail texts are null values. Thus, the Chinese text to be converted can be completely divided into Chinese root words "enterprise", "client" and "number".
106. When the tail text is judged to be null, extracting all the matched word roots from the word root container, and combining English word roots corresponding to each matched word root in the word root library according to the sequence of adding each matched word root into the word root container so as to obtain English text corresponding to the Chinese text.
In step 106, when the tail text is null, that is, the Chinese text is completely divided into Chinese roots in the root word library, the matching cycle may be ended. Since the matching root is stored in the root container after each matching in the matching cycle, the Chinese root into which the Chinese text is completely divided is stored in the root container after the matching cycle is completed. After the matching cycle is finished, the English word roots corresponding to the divided Chinese word roots are combined, so that the English text corresponding to the Chinese text can be obtained, and the Chinese text is converted into the English text. For example, the "number of clients of an enterprise" of a Chinese text is completely divided into "enterprise", "clients" and "number" of Chinese word roots, the order of adding the Chinese word roots to the word root container is "enterprise", "clients" and "number" in sequence, and the corresponding English word roots are "company", "customer" and "quality" in sequence, so that the English text finally combined is "company customer quantity".
Optionally, the method for converting the Chinese text can be applied to intelligent diagnosis and remote consultation. The Chinese text is a medical text, and the medical text can be a medical electronic record (Electronic Healthcare Record), an electronic personal health record, a series of electronic records with preservation and check values, such as medical records, electrocardiography, medical images and the like.
As can be seen, implementing the method for converting chinese text described in fig. 1, obtaining chinese text to be converted, performing a preset matching operation on the text to be matched based on a preset root bank to obtain a matching root corresponding to the text to be matched and a tail text, adding the matching root to a preset root container, judging whether the tail text is null, determining the tail text as a new text to be matched when judging that the tail text is not null, performing the preset matching operation again, extracting all the matching root from the root container when judging that the tail text is null, and combining english roots corresponding to each matching root to obtain the english text corresponding to the chinese text. Through the cyclic matching of the text to be matched, the text to be matched is completely divided into Chinese word roots in a word root library, and finally English word roots corresponding to the divided Chinese word roots are combined into English text, so that the conversion of the Chinese text into the English text can be realized through simple word root matching in the text conversion process, a text conversion model with complex calculation is avoided, the conversion efficiency of the text conversion can be improved, and in addition, the interpretation and stability of the conversion form based on word root matching are relatively higher, so that the conversion accuracy of the text conversion can be improved.
In an optional embodiment, the performing, based on a preset root word library, a preset matching operation on a text to be matched to obtain a matching root word and a tail text corresponding to the text to be matched, includes:
inquiring whether a target Chinese root identical to a text to be matched exists in a preset root word library;
When the target Chinese root is not existed in the root stock, deleting the last character in the text to be matched, and triggering and executing the step of inquiring whether the target Chinese root identical to the text to be matched exists in the preset root stock;
when the target Chinese root is found to exist in the root stock, determining the target Chinese root as a matching root corresponding to the text to be matched, and determining a part of text, except the matching root, in the text to be matched as a tail text corresponding to the text to be matched.
In this alternative embodiment, in the matching process of the text to be matched, the whole text to be matched can be compared with the Chinese root in the root library, if the Chinese root identical to the text to be matched (i.e. the target Chinese root) is found, the matching root and the tail text of the text to be matched can be determined according to the target Chinese root, if the Chinese root identical to the text to be matched (i.e. the target Chinese root) is not found, the last character of the text to be matched can be deleted to obtain a new text to be matched, then the Chinese root identical to the new text to be matched is found, if the last character is not found, the last character is continuously deleted, the Chinese root identical to the new text to be matched is continuously found, and the cycle is performed until the Chinese root identical to the text to be matched (i.e. the target Chinese root) is found. If the word root library and the Chinese text "number of clients of enterprises" are the word root library and the Chinese text "number of clients of enterprises" in the first round of matching, the text to be matched is the "number of clients of enterprises", "clients of enterprises", the final text "enterprise" to be matched can find the corresponding target Chinese word root, the matching word root obtained in the first round of matching is the "enterprise", the tail text is the "number of clients", then in the second round of matching, the text to be matched is the "number of clients", "clients" in the second round of matching, the final text "client" to be matched can find the corresponding target Chinese word root, the matching word root obtained in the third round of matching is the "number", and the tail text is an empty value, so that the Chinese text "enterprise client number" can be completely divided into the Chinese word root "number", "client" and "number of enterprises" can be realized.
It can be seen that implementing this alternative embodiment, in the matching process of the text to be matched, the whole text to be matched is compared with the chinese root in the root bank, if the target chinese root is found, the matching root and the tail text of the text to be matched are determined according to the target chinese root, if the target chinese root is not found, the last character of the text to be matched is deleted, so as to obtain a new text to be matched, and then the chinese root identical to the new text to be matched is found, so that the matching of the text to be matched can be achieved by a cyclic search method.
In an optional embodiment, after querying that the root word library does not have the target chinese root word, before deleting the last character in the text to be matched, the method further includes:
judging whether the number of characters in the text to be matched is greater than or equal to two;
When the number of characters in the text to be matched is judged to be greater than or equal to two, triggering and executing the step of deleting the last character in the text to be matched;
and outputting an error prompt for prompting the user of failed matching when judging that the number of characters in the text to be matched is not more than two.
In this optional embodiment, in the process of circularly searching the target chinese root, the characters in the text to be matched gradually decrease, but there may be a case that after the characters in the text to be matched are completely subtracted, the matching target chinese root is still not searched, that is, the text to be matched does not have the matching root in the root library, so it may be judged whether the number of the remaining characters in the text to be matched is greater than or equal to two, if not greater than or equal to two (that is, only one character remains), it is indicated that the text to be matched does not have the matching root in the root library, at this time, the searched cycle may be stopped, and an error prompt of failure in matching may be output, so that entering a dead cycle may be avoided, and the reliability of the text conversion flow may be improved. In addition, after judging that the number of the characters remaining in the text to be matched is not equal to or greater than two, the user can also manually check the text to be matched and the root stock, and then update the root stock according to the checking condition, for example, new Chinese root and English root are added into the root stock.
Therefore, in the implementation of the alternative embodiment, in the process of circularly searching the target Chinese root, whether the number of the characters remained in the text to be matched is greater than or equal to two is judged, if the number of the characters is not greater than or equal to two, an error prompt of failure in matching is output, so that the text conversion process can be prevented from entering a dead cycle, and the reliability of the text conversion process is improved.
In an optional embodiment, after determining that the number of characters in the text to be matched is not greater than two, the method further includes:
Extracting a target matching root from the root container, and deleting the target matching root from the root container, wherein the target matching root refers to the matching root added to the root container last in the root container;
Selecting a target text from the Chinese text to serve as a new text to be matched according to the target matching word root, wherein the target text is a part of text which starts with the target matching word root and ends with the last character of the Chinese text in the Chinese text;
Performing preset secondary matching operation on the text to be matched based on the root word library and the target matching root word so as to obtain secondary matching root words and secondary tail texts corresponding to the text to be matched;
Adding the secondary matching root words into the root word container;
And determining the secondary tail text as a new text to be matched, and triggering and executing the step of executing preset matching operation on the text to be matched based on a preset root stock to obtain a matching root and tail text corresponding to the text to be matched.
In this alternative embodiment, after it is determined that there is no matching root in the root stock of the text to be matched, it is indicated that the matching root obtained by the last matching may not be accurate, at this time, the matching root obtained by the last matching may be extracted from the root container and deleted, then the text to be matched is restored to the state of the last matching, the text to be matched is subjected to a secondary matching (a specific secondary matching process, which is described later in detail), and finally the process of cyclic matching is started again using the secondary matching root obtained by the secondary matching and the secondary tail text, thereby facilitating smooth performance of cyclic matching and improving reliability and accuracy of text conversion. For example, after the Chinese text "enterprise client number" is matched twice, the matching root word "enterprise" and "client number" can be obtained and added into the root word container, the residual character "quantity" cannot be matched to the proper root word, so that the segmentation result of the text to be matched is not ideal, the matching root word "client number" can be extracted from the root word container and deleted, then the text to be matched is restored to the "client number", the text to be matched is matched twice, and finally the text to be matched "client number" can be continuously segmented into more proper matching root words "client" and "number".
Therefore, after judging that the matching root of the text to be matched does not exist in the root stock, the alternative embodiment is implemented, the matching root obtained by the last matching is extracted from the root container and deleted, then the text to be matched is restored to the last matching state, the text to be matched is subjected to the second matching, and finally the process of circular matching is started again by using the second matching root and the second tail text obtained by the second matching, so that the smooth performance of circular matching is facilitated, and the reliability and accuracy of text conversion are improved.
In an optional embodiment, the performing, based on the root word library and the target matching root word, a preset secondary matching operation on the text to be matched to obtain a secondary matching root word and a secondary tail text corresponding to the text to be matched, includes:
Inquiring whether secondary Chinese word roots exist in the word root library, wherein the secondary Chinese word roots refer to Chinese word roots which are the same as texts to be matched and different from the target matched word roots;
When the secondary Chinese root is not found in the root stock, deleting the last character in the text to be matched, and triggering and executing the step of inquiring whether the secondary Chinese root is found in the root stock;
When the secondary Chinese root is found to exist in the root stock, determining the secondary Chinese root as a secondary matching root corresponding to the text to be matched, and determining a part of text, except the secondary matching root, in the text to be matched as a secondary tail text corresponding to the text to be matched.
In this alternative embodiment, the process of the second matching is similar to that of the first matching, and the second matching of the text to be matched is achieved by continuously deleting the last character in the text to be matched, and then continuously searching the Chinese root words which are the same as the text to be matched and different from the matching root words (i.e. target matching root words) obtained by the last matching in the root word library. As described above, after the text to be matched is restored to the "number of clients", the Chinese root words same as the "number of clients", "number of clients" and "clients" are sequentially searched from the root word library, where when the text to be matched is deleted to the "number of clients", the same Chinese root words exist in the root word library, but are the same as the matching root words obtained by the last matching (i.e. the target matching root words), so that the text to be matched will be ignored, the text to be matched is continuously deleted to the "clients" for searching, and finally the secondary matching root words and the secondary tail text obtained in the secondary matching are the "clients" and "numbers", respectively.
It can be seen that implementing this alternative embodiment, in the secondary matching process of the text to be matched, the entire text to be matched is compared with the chinese root in the root bank, if the secondary chinese root identical to the text to be matched and different from the target matching root is found, the secondary matching root and the secondary tail text of the text to be matched are determined according to the secondary chinese root, if the secondary chinese root is not found, the last character of the text to be matched is deleted to obtain a new text to be matched, and then the secondary chinese root identical to the new text to be matched and different from the target matching root is continuously found, so that the secondary matching of the text to be matched can be achieved by the cyclic searching method.
In an optional embodiment, after querying that the root word library does not have the secondary chinese root word, before deleting the last character in the text to be matched, the method further includes:
judging whether the number of characters in the text to be matched is greater than or equal to two;
When the number of characters in the text to be matched is judged to be greater than or equal to two, triggering and executing the step of deleting the last character in the text to be matched;
And outputting an error prompt for prompting the user of failure of secondary matching when judging that the number of characters in the text to be matched is not more than two.
In this alternative embodiment, similar to the first matching process, in the second matching process, there may be a case that after the characters in the text to be matched are completely subtracted, no matching secondary chinese root is found, that is, the text to be matched does not have a matching secondary matching root in the root library, so in the second matching process, it may be determined whether the number of the remaining characters in the text to be matched is greater than or equal to two, if not greater than or equal to two (that is, only one character remains), it is indicated that there is no matching secondary matching root in the root library in the text to be matched, at this time, the search cycle may be stopped, and an error prompt of failure of the secondary matching is output, so that entering a dead cycle may be avoided, and the reliability of the text conversion process may be improved.
Therefore, in the implementation of the alternative embodiment, in the process of circularly searching the secondary Chinese word roots, whether the number of the remaining characters of the text to be matched is greater than or equal to two is judged, if the number of the remaining characters of the text to be matched is not greater than or equal to two, an error prompt of failure of secondary matching is output, so that the text conversion process can be prevented from entering a dead cycle, and the reliability of the text conversion process is improved.
In an optional embodiment, before the determining the secondary tail text as the new text to be matched, the method further includes:
Judging whether the secondary tail text is null;
when judging that the secondary tail text is not null, triggering and executing the step of determining the secondary tail text as a new text to be matched;
And when judging that the secondary tail text is null, triggering and executing the steps of extracting all the matched roots from the root container, and combining English roots corresponding to each matched root in the root library according to the sequence of adding each matched root to the root container so as to obtain the English text corresponding to the Chinese text.
In the alternative embodiment, after the secondary matching, if the secondary tail text is null, the Chinese text is completely segmented, and matching operation is not needed, matching roots in a root container can be directly extracted, and English text corresponding to the Chinese text can be obtained by combining, so that the text conversion process can be prevented from entering dead circulation, and the reliability of the text conversion process is improved.
Optionally, it is also possible to: and uploading the conversion information of the Chinese text of the conversion method of the Chinese text into a blockchain.
Specifically, the conversion information of the Chinese text is obtained by running the conversion method of the Chinese text and is used for recording the conversion condition of the Chinese text, such as the acquired Chinese text, the matched root and tail texts and the like. The conversion information of the Chinese text is uploaded to the blockchain, so that the safety and the fairness and transparency to users can be ensured. The user can download the conversion information of the Chinese text from the blockchain so as to verify whether the conversion information of the Chinese text of the conversion method of the Chinese text is tampered. The blockchain referred to in this example is a novel mode of application for computer technology such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of a chinese text converting apparatus according to an embodiment of the invention. As shown in fig. 2, the conversion apparatus of chinese text may include:
an obtaining module 201, configured to obtain a chinese text to be converted;
A matching module 202, configured to perform a preset matching operation on a text to be matched based on a preset root stock, so as to obtain a matching root and a tail text corresponding to the text to be matched, where the text to be matched is determined based on the chinese text to be converted, an initial value of the text to be matched is the chinese text, the root stock stores a plurality of chinese roots in advance, each chinese root is preset with a corresponding english root, and the tail text refers to a part of text to be matched from which the matching root is removed;
an adding module 203, configured to add the matching root word to a preset root word container;
A judging module 204, configured to judge whether the tail text is null;
A determining module 205, configured to determine the tail text as a new text to be matched when it is determined that the tail text is not null, and trigger to execute the step of executing a preset matching operation on the text to be matched based on a preset root stock, so as to obtain a matching root word and a tail text corresponding to the text to be matched;
And the combination module 206 is configured to extract all the matching roots from the root container when the tail text is determined to be null, and combine english roots corresponding to each matching root in the root library according to the sequence of adding each matching root to the root container, so as to obtain an english text corresponding to the chinese text.
For the specific description of the above-mentioned chinese text converting apparatus, reference may be made to the specific description of the above-mentioned chinese text converting method, and in order to avoid repetition, the description will not be repeated here.
Example III
Referring to fig. 3, fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the invention. As shown in fig. 3, the computer device may include:
a memory 301 storing executable program code;
A processor 302 connected to the memory 301;
The processor 302 invokes executable program code stored in the memory 301 to perform steps in the method for converting chinese text disclosed in the first embodiment of the present invention.
Example IV
Referring to fig. 4, an embodiment of the present invention discloses a computer storage medium 401, where the computer storage medium 401 stores computer instructions for executing steps in the method for converting chinese text disclosed in the first embodiment of the present invention when the computer instructions are called.
The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above detailed description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product that may be stored in a computer-readable storage medium including Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic disc Memory, tape Memory, or any other medium that can be used for computer-readable carrying or storing data.
Finally, it should be noted that: the embodiment of the invention discloses a method, a device, a computer device and a storage medium for converting Chinese text, which are disclosed by the embodiment of the invention only in the preferred embodiment of the invention, and are only used for illustrating the technical scheme of the invention, but not limiting the technical scheme; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme recorded in the various embodiments can be modified or part of technical features in the technical scheme can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (7)

1. A method for converting chinese text, the method comprising:
Obtaining a Chinese text to be converted;
Performing preset matching operation on a text to be matched based on a preset root stock to obtain a matched root and a tail text corresponding to the text to be matched, wherein the text to be matched is determined based on the Chinese text to be converted, the initial value of the text to be matched is the Chinese text, the root stock is pre-stored with a plurality of Chinese roots, each Chinese root is pre-set with a corresponding English root, and the tail text refers to a part of text of the text to be matched except the matched root;
adding the matched root word into a preset root word container;
Judging whether the tail text is a null value or not;
When judging that the tail text is not null, determining the tail text as a new text to be matched, and triggering and executing the step of executing preset matching operation on the text to be matched based on a preset root stock so as to obtain a matched root and tail text corresponding to the text to be matched;
When judging that the tail text is null, extracting all the matched word roots from the word root container, and combining English word roots corresponding to each matched word root in the word root library according to the sequence of adding each matched word root into the word root container so as to obtain English text corresponding to the Chinese text;
the step of executing preset matching operation on the text to be matched based on the preset root stock to obtain the matched root and tail text corresponding to the text to be matched, comprises the following steps:
inquiring whether a target Chinese root identical to a text to be matched exists in a preset root word library;
When the target Chinese root is not existed in the root stock, deleting the last character in the text to be matched, and triggering and executing the step of inquiring whether the target Chinese root identical to the text to be matched exists in the preset root stock;
When the target Chinese root is found in the root stock, determining the target Chinese root as a matching root corresponding to the text to be matched, and determining a part of text, except the matching root, in the text to be matched as a tail text corresponding to the text to be matched;
after the target Chinese root is not existed in the root stock, before deleting the last character in the text to be matched, the method further comprises:
judging whether the number of characters in the text to be matched is greater than or equal to two;
When the number of characters in the text to be matched is judged to be greater than or equal to two, triggering and executing the step of deleting the last character in the text to be matched;
Outputting an error prompt for prompting the user of failed matching when judging that the number of characters in the text to be matched is not more than two;
after judging that the number of characters in the text to be matched is not greater than or equal to two, the method further comprises:
Extracting a target matching root from the root container, and deleting the target matching root from the root container, wherein the target matching root refers to the matching root added to the root container last in the root container;
Selecting a target text from the Chinese text to serve as a new text to be matched according to the target matching word root, wherein the target text is a part of text which starts with the target matching word root and ends with the last character of the Chinese text in the Chinese text;
Performing preset secondary matching operation on the text to be matched based on the root word library and the target matching root word so as to obtain secondary matching root words and secondary tail texts corresponding to the text to be matched;
Adding the secondary matching root words into the root word container;
And determining the secondary tail text as a new text to be matched, and triggering and executing the step of executing preset matching operation on the text to be matched based on a preset root stock to obtain a matching root and tail text corresponding to the text to be matched.
2. The method for converting chinese text according to claim 1, wherein said performing a preset secondary matching operation on the text to be matched based on the root stock and the target matching root to obtain a secondary matching root and a secondary tail text corresponding to the text to be matched includes:
Inquiring whether secondary Chinese word roots exist in the word root library, wherein the secondary Chinese word roots refer to Chinese word roots which are the same as texts to be matched and different from the target matched word roots;
When the secondary Chinese root is not found in the root stock, deleting the last character in the text to be matched, and triggering and executing the step of inquiring whether the secondary Chinese root is found in the root stock;
When the secondary Chinese root is found to exist in the root stock, determining the secondary Chinese root as a secondary matching root corresponding to the text to be matched, and determining a part of text, except the secondary matching root, in the text to be matched as a secondary tail text corresponding to the text to be matched.
3. The method for converting chinese text of claim 2, wherein after querying that the root word library does not have the secondary chinese root word, the method further comprises, prior to deleting the last character in the text to be matched:
judging whether the number of characters in the text to be matched is greater than or equal to two;
When the number of characters in the text to be matched is judged to be greater than or equal to two, triggering and executing the step of deleting the last character in the text to be matched;
And outputting an error prompt for prompting the user of failure of secondary matching when judging that the number of characters in the text to be matched is not more than two.
4. The method for converting chinese text of claim 1, wherein said determining said secondary tail text as a new text to be matched is preceded by:
Judging whether the secondary tail text is null;
when judging that the secondary tail text is not null, triggering and executing the step of determining the secondary tail text as a new text to be matched;
And when judging that the secondary tail text is null, triggering and executing the steps of extracting all the matched roots from the root container, and combining English roots corresponding to each matched root in the root library according to the sequence of adding each matched root to the root container so as to obtain the English text corresponding to the Chinese text.
5. A chinese text conversion apparatus for implementing the steps of the chinese text conversion method according to any one of claims 1 to 4, the apparatus comprising:
The acquisition module is used for acquiring the Chinese text to be converted;
The matching module is used for executing preset matching operation on a text to be matched based on a preset root stock to obtain a matching root and a tail text corresponding to the text to be matched, wherein the text to be matched is determined based on the Chinese text to be converted, the initial value of the text to be matched is the Chinese text, the root stock is pre-stored with a plurality of Chinese roots, each Chinese root is pre-set with a corresponding English root, and the tail text refers to a part of text of the text to be matched except the matching root;
the adding module is used for adding the matched root words into a preset root word container;
the judging module is used for judging whether the tail text is a null value or not;
The determining module is used for determining the tail text as a new text to be matched when the tail text is not a null value, and triggering and executing the step of executing preset matching operation on the text to be matched based on a preset root stock so as to obtain a matching root and tail text corresponding to the text to be matched;
And the combination module is used for extracting all the matched roots from the root container when the tail text is judged to be null, and combining English roots corresponding to each matched root in the root library according to the sequence of adding each matched root to the root container so as to obtain English text corresponding to the Chinese text.
6. A computer device, the computer device comprising:
A memory storing executable program code;
a processor coupled to the memory;
The processor invokes the executable program code stored in the memory to perform a method of converting chinese text as claimed in any one of claims 1 to 4.
7. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements a method of converting chinese text according to any one of claims 1-4.
CN202111215392.XA 2021-10-19 2021-10-19 Method, device, equipment and medium for converting Chinese text Active CN113962198B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111215392.XA CN113962198B (en) 2021-10-19 2021-10-19 Method, device, equipment and medium for converting Chinese text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111215392.XA CN113962198B (en) 2021-10-19 2021-10-19 Method, device, equipment and medium for converting Chinese text

Publications (2)

Publication Number Publication Date
CN113962198A CN113962198A (en) 2022-01-21
CN113962198B true CN113962198B (en) 2024-06-25

Family

ID=79465423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111215392.XA Active CN113962198B (en) 2021-10-19 2021-10-19 Method, device, equipment and medium for converting Chinese text

Country Status (1)

Country Link
CN (1) CN113962198B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084796A (en) * 2020-09-15 2020-12-15 南京文图景信息科技有限公司 Multi-language place name root Chinese translation method based on Transformer deep learning model
CN112417102A (en) * 2020-11-26 2021-02-26 中国科学院自动化研究所 Voice query method, device, server and readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334939A (en) * 2008-06-03 2008-12-31 谷祖顺 Alphabet system ordering method for multination written dictionary class-word
WO2020080300A1 (en) * 2018-10-15 2020-04-23 Ricoh Company, Ltd. Input apparatus, input method, program, and input system
CN111859972B (en) * 2020-07-28 2024-03-15 平安科技(深圳)有限公司 Entity identification method, entity identification device, computer equipment and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084796A (en) * 2020-09-15 2020-12-15 南京文图景信息科技有限公司 Multi-language place name root Chinese translation method based on Transformer deep learning model
CN112417102A (en) * 2020-11-26 2021-02-26 中国科学院自动化研究所 Voice query method, device, server and readable storage medium

Also Published As

Publication number Publication date
CN113962198A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
CN111931717B (en) Semantic and image recognition-based electrocardiogram information extraction method and device
CN111859986B (en) Semantic matching method, device, equipment and medium based on multi-task twin network
CN107967258B (en) Method and system for emotion analysis of text information
CN109582772A (en) Contract information extracting method, device, computer equipment and storage medium
CN111914076B (en) User image construction method, system, terminal and storage medium based on man-machine conversation
CN110377750B (en) Comment generation method, comment generation device, comment generation model training device and storage medium
US20120158599A1 (en) System and method for analyzing office action of patent application
CN112182167B (en) Text matching method and device, terminal equipment and storage medium
CN111506709B (en) Entity linking method and device, electronic equipment and storage medium
CN114268747A (en) Interview service processing method based on virtual digital people and related device
CN113962198B (en) Method, device, equipment and medium for converting Chinese text
US11494431B2 (en) Generating accurate and natural captions for figures
JP2019082860A (en) Generation program, generation method and generation device
CN110909174A (en) Knowledge graph-based method for improving entity link in simple question answering
CN113889281B (en) Chinese medical intelligent entity identification method and device and computer equipment
CN115794105A (en) Micro-service extraction method and device and electronic equipment
CN113657132A (en) Invoice image recognition method, device, equipment and medium based on two-dimensional code recognition
CN114548314A (en) Text matching method and device, storage medium and electronic equipment
CN113792129B (en) Intelligent session method, device, computer equipment and medium
CN116757203B (en) Natural language matching method, device, computer equipment and storage medium
CN113742360B (en) Method and device for rapidly generating SQL script of heterogeneous database based on metadata
CN117874210B (en) Autonomous interactive question-answering method, system, equipment and medium
CN112580309B (en) Document data processing method, device, computer equipment and storage medium
CN117194275B (en) Automatic software automatic test plan generation method and system based on intelligent algorithm
CN113221705B (en) Automatic classification method, device, equipment and storage medium for electronic documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant