CN116702782A

CN116702782A - Text processing method, text processing device, electronic equipment and storage medium

Info

Publication number: CN116702782A
Application number: CN202310834354.5A
Authority: CN
Inventors: 李兴泉
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2023-07-07
Filing date: 2023-07-07
Publication date: 2023-09-05

Abstract

The embodiment of the application provides a text processing method, a text processing device, electronic equipment and a storage medium, and belongs to the field of financial science and technology. The method comprises the following steps: acquiring reference Chinese field data from a client; acquiring the field length of each reference Chinese field in the reference Chinese field data, and constructing a preliminary root list based on the field length, wherein the preliminary root list comprises a plurality of reference roots and Chinese names of each reference root; performing root translation on each reference root of the preliminary root list to obtain an English name of each reference root, and constructing a target root list based on the English name and the preliminary root list; acquiring target Chinese field data to be processed from a client; performing field word segmentation on the target Chinese field data to obtain a target phrase; and translating the target phrase based on the target root list to obtain a target text corresponding to the target Chinese field data. The application can improve the data development efficiency and the text translation accuracy.

Description

Text processing method, text processing device, electronic equipment and storage medium

Technical Field

The present application relates to the field of financial science and technology, and in particular, to a text processing method, a text processing device, an electronic device, and a storage medium.

Background

With the development of computer technology and artificial intelligence, conventional off-line services have been gradually migrated to the line, and an irreversible trend has been developed. For example, the online shopping, online live broadcast, online transaction and other business scenes are favorable for institutions such as banks, network merchants and the like to develop business.

Because network users in different areas and different languages are often used to adopting different languages for communication and communication in service scenes such as online shopping, online live broadcasting, online transaction and the like, when the network users in different areas and different languages are connected, institutions such as banks and network merchants often need to perform online translation on text data input by the network users and provide corresponding services according to the translation conditions.

At present, related personnel often translate text data according to development experience, but the translation habits of different developers are different, so that when the translated text is used for other development links, the text data needs to be translated again, the data development efficiency is low, and the accuracy of the text translation is affected.

Disclosure of Invention

The embodiment of the application mainly aims to provide a text processing method, a text processing device, electronic equipment and a storage medium, which aim to improve data development efficiency and text translation accuracy.

To achieve the above object, a first aspect of an embodiment of the present application provides a text processing method, including:

acquiring reference Chinese field data from a client;

acquiring the field length of each reference Chinese field in the reference Chinese field data, and constructing a preliminary root list based on the field length, wherein the preliminary root list comprises a plurality of reference roots and Chinese names of each reference root;

performing root translation on each reference root of the preliminary root word list to obtain an English name of each reference root word, and constructing a target root word list based on the English name and the preliminary root word list;

acquiring target Chinese field data to be processed from the client;

performing field word segmentation on the target Chinese field data to obtain a target phrase;

and translating the target phrase based on the target root list to obtain a target text corresponding to the target Chinese field data.

In some embodiments, the obtaining the field length of each reference chinese field in the reference chinese field data and constructing the preliminary root list based on the field length includes:

acquiring the field length of each reference Chinese field;

comparing the field length with a preset length threshold;

if the field length is greater than the length threshold, taking the reference Chinese field as a first root word;

if the field length is smaller than or equal to the length threshold value, carrying out symbol recognition on the reference Chinese field, and carrying out word segmentation on the reference Chinese field according to a recognition result to obtain a plurality of second word roots;

and constructing the preliminary root list according to the first root word and the second root word.

In some embodiments, if the field length is less than or equal to the length threshold, performing symbol recognition on the reference chinese field, and performing word segmentation on the reference chinese field according to a recognition result to obtain a plurality of second root words, including:

if the field length is smaller than or equal to the length threshold value, identifying whether a target punctuation mark exists in the reference Chinese field;

If the identification result is that the target punctuation mark exists in the reference Chinese field, content deletion is carried out on the reference Chinese field according to the target punctuation mark to obtain an intermediate Chinese field, word segmentation is carried out on the intermediate Chinese field to obtain the second word root;

and if the recognition result is that the target punctuation mark does not exist in the reference Chinese field, word segmentation is carried out on the reference Chinese field, and the second root word is obtained.

In some embodiments, the constructing the preliminary root list from the first root word and the second root word includes:

performing root classification on the first root word and the second root word according to a preset root word class label to obtain a plurality of root word sets;

and constructing the preliminary root list based on the root class label and the root set.

In some embodiments, the performing root translation on each reference root of the preliminary root term list to obtain an english name of each reference root term, and constructing a target root term list based on the english name and the preliminary root term list includes:

the preliminary root word list is sent to the client so that the client translates each reference root word of the preliminary root word list to obtain the English name of each reference root word;

Acquiring English names of each reference root fed back by the client according to the preliminary root list;

and embedding the English names into the preliminary root list to obtain the target root list.

In some embodiments, after performing root translation on each reference root of the preliminary root term list to obtain an english name of each reference root term, and constructing a target root term list based on the english name and the preliminary root term list, the method further includes:

acquiring a reference root word representing the same semantic meaning in the target root word list;

and performing name calibration on the reference root words representing the same semantics.

In some embodiments, the translating the target phrase based on the target root list to obtain a target text corresponding to the target chinese field data includes:

traversing the target root list, and comparing the target phrase with the reference root of the target root list to obtain candidate root data;

integrating the candidate root data to obtain the target text;

and sending the target text to the client.

To achieve the above object, a second aspect of an embodiment of the present application provides a text processing apparatus, including:

The data acquisition module is used for acquiring reference Chinese field data from the client;

the initial root list construction module is used for acquiring the field length of each reference Chinese field in the reference Chinese field data and constructing an initial root list based on the field length, wherein the initial root list comprises a plurality of reference roots and Chinese names of each reference root;

the target root list construction module is used for carrying out root translation on each reference root of the preliminary root list to obtain an English name of each reference root, and constructing a target root list based on the English name and the preliminary root list;

the field acquisition module is used for acquiring target Chinese field data to be processed from the client;

the field word segmentation module is used for carrying out field word segmentation on the target Chinese field data to obtain a target phrase;

and the phrase translation module is used for carrying out translation processing on the target phrase based on the target root list to obtain a target text corresponding to the target Chinese field data.

To achieve the above object, a third aspect of the embodiments of the present application provides an electronic device, which includes a memory, a processor, where the memory stores a computer program, and the processor implements the method described in the first aspect when executing the computer program.

To achieve the above object, a fourth aspect of the embodiments of the present application proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of the first aspect.

The application provides a text processing method, a text processing device, electronic equipment and a storage medium, which are used for acquiring reference Chinese field data from a client; the method comprises the steps of obtaining the field length of each reference Chinese field in reference Chinese field data, and constructing a preliminary root list based on the field length, wherein the preliminary root list comprises a plurality of reference roots and Chinese names of each reference root, different word segmentation modes can be selected to obtain the reference roots according to different field lengths of the reference Chinese fields, word segmentation accuracy and semantic integrity of the reference roots can be effectively improved, and the root quality of the root list constructed based on the second roots is improved. Further, root translation is performed on each reference root of the preliminary root list to obtain an English name of each reference root, and a target root list is constructed based on the English name and the preliminary root list, so that the target root list contains Chinese content information and English content information of the reference root, and the comprehensiveness of the root information is improved. Finally, obtaining target Chinese field data to be processed from the client; performing field word segmentation on the target Chinese field data to obtain a target phrase; and then, performing translation processing on the target phrase based on the target root list to obtain a target text corresponding to the target Chinese field data, performing text translation on the target Chinese field data needing translation processing based on the target root list, and generating a corresponding translation text, so that the same target root list can be used for various data development scenes, the translation time loss in the data development process can be reduced, and the data development efficiency is improved. Meanwhile, the text processing method of the embodiment of the application is not used for text translation based on experience of developers in the traditional technology, so that the text translation is more accurate, further, the conversation efficiency and conversation effectiveness of institutions such as banks, network merchants and the like and network objects are improved, the institutions such as banks, network merchants and the like can conduct business guidance and commodity recommendation for the network objects based on the translation text, the accuracy and quality of commodity recommendation are improved, and the commodity transaction efficiency is improved.

Drawings

FIG. 1 is a flow chart of a text processing method provided by an embodiment of the present application;

fig. 2 is a flowchart of step S102 in fig. 1;

fig. 3 is a flowchart of step S204 in fig. 2;

fig. 4 is a flowchart of step S205 in fig. 2;

fig. 5 is a flowchart of step S103 in fig. 1;

FIG. 6 is another flow chart of a text processing method provided by an embodiment of the present application;

fig. 7 is a flowchart of step S106 in fig. 1;

fig. 8 is a schematic structural diagram of a text processing device according to an embodiment of the present application;

fig. 9 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

First, several nouns involved in the present application are parsed:

artificial intelligence (artificial intelligence, AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding the intelligence of people; artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a manner similar to human intelligence, research in this field including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Natural language processing (natural language processing, NLP): NLP is a branch of artificial intelligence that is a interdisciplinary of computer science and linguistics, and is often referred to as computational linguistics, and is processed, understood, and applied to human languages (e.g., chinese, english, etc.). Natural language processing includes parsing, semantic analysis, chapter understanding, and the like. Natural language processing is commonly used in the technical fields of machine translation, handwriting and print character recognition, voice recognition and text-to-speech conversion, information intent recognition, information extraction and filtering, text classification and clustering, public opinion analysis and opinion mining, and the like, and relates to data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research, linguistic research related to language calculation, and the like.

Information extraction (Information Extraction, NER): extracting the fact information of the appointed type of entity, relation, event and the like from the natural language text, and forming the text processing technology of the structured data output. Information extraction is a technique for extracting specific information from text data. Text data is made up of specific units, such as sentences, paragraphs, chapters, and text information is made up of small specific units, such as words, phrases, sentences, paragraphs, or a combination of these specific units. The noun phrase, the name of a person, the name of a place, etc. in the extracted text data are all text information extraction, and of course, the information extracted by the text information extraction technology can be various types of information.

Root of word: is a glossary term. Symmetry of the "affix". Morphemes which represent the meaning of basic vocabularies are included in vocabularies.

Based on the above, the embodiment of the application provides a text processing method, a text processing device, electronic equipment and a storage medium, which aim to improve the data development efficiency and the text translation accuracy.

The text processing method and device, the electronic device and the storage medium provided by the embodiment of the application are specifically described through the following embodiments, and the text processing method in the embodiment of the application is described first.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The embodiment of the application provides a text processing method, which relates to the technical field of artificial intelligence. The text processing method provided by the embodiment of the application can be applied to the terminal, can be applied to the server side, and can also be software running in the terminal or the server side. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, etc.; the server side can be configured as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like; the software may be an application or the like that implements a text processing method, but is not limited to the above form.

The application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Fig. 1 is an optional flowchart of a text processing method according to an embodiment of the present application, which is applied to a server, where the method in fig. 1 may include, but is not limited to, steps S101 to S106.

Step S101, obtaining reference Chinese field data from a client;

step S102, obtaining the field length of each reference Chinese field in the reference Chinese field data, and constructing a preliminary root list based on the field length, wherein the preliminary root list comprises a plurality of reference roots and Chinese names of each reference root;

step S103, performing root translation on each reference root of the preliminary root list to obtain English names of each reference root, and constructing a target root list based on the English names and the preliminary root list;

step S104, obtaining target Chinese field data to be processed from a client;

step S105, field word segmentation is carried out on the target Chinese field data to obtain a target phrase;

and step S106, performing translation processing on the target phrase based on the target root list to obtain a target text corresponding to the target Chinese field data.

Step S101 to step S106 shown in the embodiment of the present application are implemented by acquiring reference chinese field data from a client; the method comprises the steps of obtaining the field length of each reference Chinese field in reference Chinese field data, and constructing a preliminary root list based on the field length, wherein the preliminary root list comprises a plurality of reference roots and Chinese names of each reference root, different word segmentation modes can be selected to obtain the reference roots according to different field lengths of the reference Chinese fields, word segmentation accuracy and semantic integrity of the reference roots can be effectively improved, and the root quality of the root list constructed based on the second roots is improved. Further, root translation is performed on each reference root of the preliminary root list to obtain an English name of each reference root, and a target root list is constructed based on the English name and the preliminary root list, so that the target root list contains Chinese content information and English content information of the reference root, and the comprehensiveness of the root information is improved. Finally, obtaining target Chinese field data to be processed from the client; performing field word segmentation on the target Chinese field data to obtain a target phrase; and then, performing translation processing on the target phrase based on the target root list to obtain a target text corresponding to the target Chinese field data, performing text translation on the target Chinese field data needing translation processing based on the target root list, and generating a corresponding translation text, so that the same target root list can be used for various data development scenes, the translation time loss in the data development process can be reduced, and the data development efficiency is improved. Meanwhile, the text processing method of the embodiment of the application is not used for text translation based on experience of developers in the traditional technology, and can enable the text translation to be more accurate.

In step S101 of some embodiments, when the reference chinese field data is obtained from the client, the client may be a mobile phone or a tablet lamp terminal device, the target object uploads the vocabulary list file for word segmentation to the client, and after the server receives the vocabulary list file for word segmentation, the vocabulary list file is added to a preset word stock, where the preset word stock may be a custom ansj word stock or the like. After the addition and storage of the vocabulary list file are completed, the server side sends a reference field acquisition request to the client side, and the client side calls the reference Chinese field data uploaded by the target object based on the reference field acquisition request and sends the reference Chinese field data to the server side.

It should be noted that, the reference chinese field data may include common fields in financial fields such as finance, transaction, insurance, bank, online banking, interest, credit, mortgage, stock, currency, investment, fund, portfolio, account, and the like.

In step S102 of some embodiments, when obtaining the field length of each reference chinese field in the reference chinese field data and constructing the preliminary root list based on the field lengths, in order to improve the accuracy of root segmentation, the field length of the reference chinese field may be introduced as the basis of word segmentation. Setting a length threshold value, comparing the field length of each reference Chinese field with the length threshold value, and setting different root extraction modes according to the size relation between the field length and the length threshold value. And when the field length is greater than the length threshold value, acquiring the reference root in a first mode, and when the field length is less than or equal to the length threshold value, acquiring the reference root in a second mode. Finally, summarizing the root words obtained in the two modes to obtain all the reference root words, and constructing a preliminary root word list based on root word information of the series of reference root words. The preliminary root list comprises a plurality of reference roots and Chinese names of the reference roots. According to the method, different word segmentation modes can be selected to obtain the reference word roots according to different field lengths of the reference Chinese fields, and the word segmentation accuracy and the semantic integrity of the reference word roots can be improved.

Referring to fig. 2, in some embodiments, step S102 may include, but is not limited to, steps S201 to S205:

step S201, obtaining the field length of each reference Chinese field;

step S202, comparing the field length with a preset length threshold;

step S203, if the field length is greater than the length threshold, taking the reference Chinese field as a first root word;

step S204, if the field length is smaller than or equal to the length threshold value, performing symbol recognition on the reference Chinese field, and performing word segmentation on the reference Chinese field according to the recognition result to obtain a plurality of second word roots;

step S205, a preliminary root list is constructed according to the first root word and the second root word.

In step S201 of some embodiments, the field length of each reference chinese field may be obtained based on a preset length function in the database, where the preset length function may be a length () function or the like, without limitation.

In step S202 of some embodiments, the field length of each reference chinese field is compared with a length threshold, where the length threshold may be set according to actual traffic requirements, for example, the length threshold is set to 10 bytes. According to the length threshold, the reference Chinese fields are divided into two types, wherein one type is the reference Chinese field with the field length being more than 10 bytes, and the other type is the reference Chinese field with the field length being less than or equal to 10 bytes, so that word segmentation processing is respectively carried out on the two types of reference Chinese fields according to different word segmentation modes.

In step S203 of some embodiments, if the field length is greater than the length threshold, it indicates that splitting the reference chinese field into single phrases often cannot better characterize the complete semantic information of the reference chinese field, so when the field length is greater than the length threshold, the reference chinese field needs to be taken as a whole, that is, when the field length is greater than the length threshold, the entire reference chinese field is taken as the first root word, and the word segmentation operation is not performed on the reference chinese field. For example, a first root word is "the number of the converted insurance policy after delivery in the thread expiration date".

In step S204 of some embodiments, if the field length is less than or equal to the length threshold, it indicates that splitting the reference chinese field into single phrases may also well characterize the complete semantic information of the reference chinese field. Therefore, the reference chinese field needs to be segmented to form a plurality of phrase units. Further, in order to extract the key semantic information of the reference chinese field, symbol recognition is further required to be performed on the reference chinese field, and whether a target punctuation mark exists in the reference chinese field is determined, where the target punctuation mark is used to distinguish the main content and the supplementary content of the reference chinese field, so that according to whether the target punctuation mark exists in the reference chinese field, different word segmentation manners can be performed on the reference chinese field to obtain the second root word.

In step S205 of some embodiments, when a preliminary root list is constructed according to the first root word and the second root word, the first root word and the second root word may be subjected to root classification according to a preset root class label, so as to obtain a plurality of root word sets. Further, a preliminary root list is constructed based on the root class labels and the root set, wherein the preliminary root list comprises a plurality of reference roots and Chinese names of each reference root.

According to the steps S201 to S205, different word segmentation modes can be selected to obtain the reference root according to different field lengths of the reference Chinese field, and when the field lengths are not more than a length threshold value, whether supplementary contents defined by target punctuation marks such as a colon and brackets exist in the reference Chinese field or not is identified, the supplementary contents are adaptively content-deleted, words are segmented based on key contents in the reference Chinese field, and a second root is formed, so that the word segmentation accuracy and the semantic integrity of the reference root can be effectively improved, the root quality of a root list constructed based on the second root is improved, the text translation accuracy based on the root list is improved, the multiplexing of the root list is also improved, and the data development efficiency is improved.

Referring to fig. 3, in some embodiments, step S204 may include, but is not limited to, steps S301 to S303:

step S301, if the field length is less than or equal to the length threshold, identifying whether a target punctuation mark exists in the reference Chinese field;

step S302, if the identification result is that the target punctuation mark exists in the reference Chinese field, performing content deletion on the reference Chinese field according to the target punctuation mark to obtain an intermediate Chinese field, and performing word segmentation on the intermediate Chinese field to obtain a second word root;

step S303, if the recognition result is that the target punctuation mark does not exist in the reference Chinese field, word segmentation is carried out on the reference Chinese field, and a second root is obtained.

In step S301 of some embodiments, if the field length is less than or equal to the length threshold, it indicates that splitting the reference chinese field into a single phrase may also well characterize the complete semantic information of the reference chinese field. Therefore, the reference chinese field needs to be segmented to form a plurality of phrase units. Further, in order to extract the key semantic information of the reference chinese field, symbol recognition is also required to be performed on the reference chinese field, and it is determined whether a target punctuation mark exists in the reference chinese field, where the target punctuation mark is used to distinguish the main content and the supplemental content of the reference chinese field. In the specific identification process of the target punctuation mark, all the punctuation marks of the reference Chinese field can be traversed, so that the extracted punctuation marks are classified and positioned, whether the target punctuation marks for distinguishing main content and supplementary content of the reference Chinese field exist in the extracted punctuation marks is determined, and according to the identification condition, an identification result is generated, wherein the target punctuation marks comprise a colon, brackets and the like, and the identification result comprises the target punctuation marks of the reference Chinese field and initial position information and end position information of the target punctuation marks, or comprises the target punctuation marks of the reference Chinese field.

In step S302 of some embodiments, if the recognition result is that the reference chinese field has the target punctuation mark, positioning the supplemental content in the reference chinese field according to the start position information and the end position information of the target punctuation mark included in the recognition result, and deleting the content of the positioned supplemental content, thereby shortening the field content of the reference chinese field, eliminating the redundant information of the reference chinese field, and retaining the more important semantic information to obtain the intermediate chinese field. Further, a preset Jieba word segmentation device or a word segmentation mode based on statistics and the like are adopted to conduct word segmentation processing on the middle Chinese field, and a second word root is obtained.

In step S303 of some embodiments, if the recognition result is that the reference chinese field does not have the target punctuation mark, it indicates that the reference chinese field does not have redundant information such as supplementary content, and the like, then the reference chinese field may be directly segmented by using a preset Jieba segmenter or a statistical-based segmentation method, to obtain a second root word.

Through the steps S301 to S303, when the field length is not greater than the length threshold, it can be identified whether the supplementary content defined by the target punctuation marks such as colon and brackets exists in the reference chinese field, the supplementary content is adaptively deleted, and the second root is formed by word segmentation based on the key content in the reference chinese field, so that the accuracy of word segmentation and the semantic integrity of the reference root can be effectively improved, and the quality of the root list constructed based on the second root can be improved.

Referring to fig. 4, in some embodiments, step S205 may include, but is not limited to, steps S401 to S402:

step S401, performing root classification on the first root word and the second root word according to a preset root word class label to obtain a plurality of root word sets;

step S402, a preliminary root list is constructed based on the root class labels and the root set.

In step S401 of some embodiments, the preset root class label includes a basic root, a service root, an aggregate root, a number bin level root, and so on, where the basic root refers to a minimum unit of things, and has no specific service meaning, such as: quantity/number/times-cnt, number-no, ratio/rate-rate, amount-amt all belong to the basic root; business roots refer to things that have a particular business meaning, such as: diamond-diamond, premium-prem, policy-pol, agent-agent, client-cut all belong to business roots; the root of an aggregation refers to words describing the aggregation operation, such as: week cumulative-wtd, daily-d, monthly-m, average-avg all belong to the aggregate root. When the first root word and the second root word are classified based on the root class label, a preset classification function (such as a softmax function) may be called to classify the first root word and the second root word. Taking a softmax function as an example, creating probability distribution of a first root and a second root on each root category label by using the softmax function, determining probability that the first root and the second root belong to each root category label according to the probability distribution, selecting the root category label with the highest probability as the root category of the first root and the second root, classifying the first root and the second root into the corresponding root categories respectively, and obtaining a plurality of root sets.

In step S402 of some embodiments, a plurality of data tables are constructed based on the root class labels, root information of reference roots (i.e., root sets corresponding to each root class label) belonging to the same root class label is summarized into one data table to obtain an intermediate data table, and these intermediate data tables are summarized into one list to obtain a preliminary root list, where the preliminary root list includes information such as root class, root chinese name, etc. of each reference root.

Through the steps S401 to S402, the root belonging to different root class labels can be classified and integrated more conveniently, a corresponding root set is formed for each root class label, and then the reference root is integrated into a root list by taking the root set as a unit, so that a preliminary root list is obtained, the root classification accuracy of the preliminary root list can be improved, and the constructed preliminary root list has better logic property.

Referring to fig. 5, in some embodiments, step S103 may include, but is not limited to, steps S501 to S503:

step S501, a preliminary root list is sent to a client so that the client carries out translation processing on each reference root of the preliminary root list to obtain English names of each reference root;

Step S502, english names of each reference root fed back by the client according to the preliminary root list are obtained;

step S503, embedding English names into the preliminary root list to obtain a target root list.

In step S501 of some embodiments, the server side sends the preliminary root word list to the client side in a wireless communication or wired communication manner, so that the client side can call a preset chinese-english dictionary, manual translation, and other manners to perform translation processing on each reference root word of the preliminary root word list, and obtain an english name of each reference root word.

In step S502 of some embodiments, the server side sends a translation result obtaining request to the client side in a wireless communication or wired communication manner, so that the client side sends an english name of each reference root translated based on the preliminary root list to the server side according to the received translation result obtaining request.

In step S503 of some embodiments, the server side embeds the english name into the corresponding position of the preliminary root list according to each reference root of the preliminary root list and the comparison relation of the english names in the translation result, so as to obtain a target root list, where the target root list includes a plurality of reference roots and root content information of each reference root, and the root content information includes information such as a root category of the reference root, a root chinese name, a root english abbreviation, and a root description. For example, for a certain reference root, the root category is an aggregate root, the root chinese name is a week accumulation, the root english name is week_to_date, the root english abbreviation is wtd, and the root description is accumulated from the week to the day.

The English content information of each reference root can be conveniently obtained through the steps S501 to S503, and Chinese and English information of the reference root are integrated together, so that the content information of the target root list is more comprehensive and complete, the accuracy of text translation and word segmentation based on the target root list can be facilitated, and the data development efficiency in the development process can be improved.

Referring to fig. 6, after step S103 of some embodiments, the text processing method further includes, but is not limited to, steps S601 to S602:

step S601, obtaining a reference root word representing the same semantic meaning in a target root word list;

step S602, performing name calibration on the reference root word representing the same semantic meaning.

In step S601 of some embodiments, feature extraction may be performed on reference roots in the target root list based on a named entity recognition algorithm or the like to obtain root semantic features of each reference root, and similarity calculation may be performed on the root semantic features of the reference roots to obtain semantic similarity between every two reference roots, where the semantic similarity may be obtained based on performing euclidean distance calculation or cosine similarity calculation on the root semantic features of every two reference roots, and a reference root whose semantic similarity is smaller than a preset threshold is used as a reference root for representing the same semantic.

In step S602 of some embodiments, in order to improve the accuracy of word segmentation based on the target root list, root content information needs to be established for the reference root that characterizes the same semantic meaning, that is, even for the reference root that characterizes the same semantic meaning, different chinese names need to be used while the same english abbreviation is used. Therefore, the reference root word representing the same semantic meaning needs to be named in different Chinese characters, but the English abbreviations of the reference root word representing the same semantic meaning need to be consistent. For example, two reference roots that characterize the same semantic meaning may be denoted as number-no, number-no.

Through the steps S601 to S602, the reference root words representing the same semantics can be calibrated, so that the reference root words representing the same semantics can be better distinguished, the occurrence of root word confusion and the like when the target root word list is utilized for word segmentation is avoided, and the accuracy of word segmentation and text translation based on the target root word list is improved.

In step S104 of some embodiments, the target object uploads the target chinese field data to be processed to the client in a manner of typing or voice input or interface input, so that the client sends the target chinese field data to the server in a manner of wireless communication or wired communication, so that the server can obtain the target chinese field data.

In step S105 of some embodiments, the target chinese field data may be field-segmented according to the position of the punctuation mark of the target chinese field data, where the target chinese field data is divided into a plurality of intermediate fields, and then the intermediate fields are further divided according to a preset grammar rule or a part-of-speech category, so as to obtain the target phrase.

Referring to fig. 7, in some embodiments, step S106 may include, but is not limited to, steps S701 to S703:

step S701, traversing the target root list, and comparing the target phrase with the reference root of the target root list to obtain candidate root data;

step S702, integrating candidate root data to obtain a target text;

step S703, the target text is sent to the client.

In step S701 of some embodiments, a target root list is traversed, a target phrase is compared with reference roots of the target root list, semantic relevance between each reference root and the target phrase is calculated by using a euclidean distance or cosine similarity algorithm, etc., the reference root with the highest semantic relevance is used as a candidate root, and root content information of the candidate root is extracted as candidate root data.

In step S702 of some embodiments, when integrating the candidate root data, content is completed for the corresponding target phrase by using the root content information of the candidate root, so as to obtain a target text, where the target text includes chinese content information and english content information of the target phrase.

In step S703 of some embodiments, the server side sends the target text to the client side in a wireless communication or wired communication manner, so as to implement the translation processing of the target chinese field data, so that the target text can be used in different development environments and development links.

Through the steps S701 to S703, text translation can be performed on the target chinese field data to be translated based on the target root list, and corresponding translation text is generated, so that the same target root list can be used for various data development scenes, translation time loss in the data development process can be reduced, and data development efficiency is improved. Meanwhile, the text processing method of the embodiment of the application is not used for text translation based on experience of developers in the traditional technology, so that the text translation is more accurate, and the problems of low consistency of translated text, large use limitation and the like caused by translation habits of different developers can be effectively avoided.

The text processing method of the embodiment of the application obtains the reference Chinese field data from the client; the method comprises the steps of obtaining the field length of each reference Chinese field in reference Chinese field data, and constructing a preliminary word root list based on the field length, wherein the preliminary word root list comprises a plurality of reference word roots and Chinese names of each reference word root, and different word segmentation modes can be selected to obtain the reference word roots according to different field lengths of the reference Chinese fields. When the field length is not greater than the length threshold value, identifying whether the supplementary content limited by target punctuation marks such as a colon, brackets and the like exists in the reference Chinese field, adaptively deleting the supplementary content, and segmenting words based on the key content in the reference Chinese field to form a second root, so that the accuracy of segmentation words and the semantic integrity of the reference root can be effectively improved, and the quality of the root of a root list constructed based on the second root can be improved. Further, root translation is performed on each reference root of the preliminary root list to obtain an English name of each reference root, and a target root list is constructed based on the English name and the preliminary root list, so that the target root list contains Chinese content information and English content information of the reference root, and the comprehensiveness of the root information is improved. Finally, obtaining target Chinese field data to be processed from the client; performing field word segmentation on the target Chinese field data to obtain a target phrase; and then, performing translation processing on the target phrase based on the target root list to obtain a target text corresponding to the target Chinese field data, performing text translation on the target Chinese field data needing translation processing based on the target root list, and generating a corresponding translation text, so that the same target root list can be used for various data development scenes, the translation time loss in the data development process can be reduced, and the data development efficiency is improved. Meanwhile, the text processing method of the embodiment of the application is not used for text translation based on experience of developers in the traditional technology, can enable the text translation to be more accurate, can effectively avoid the problems of low consistency of translation texts, large use limitation and the like caused by translation habits of different developers, can improve the conversation efficiency and conversation effectiveness of institutions such as banks, network merchants and the like and network objects, can enable institutions such as banks, network merchants and the like to conduct business guidance and commodity recommendation for the network objects based on the translation texts, is beneficial to improving the accuracy and service quality of commodity recommendation, and is beneficial to improving commodity transaction efficiency.

Referring to fig. 8, an embodiment of the present application further provides a text processing apparatus, which may implement the above text processing method, where the apparatus includes:

a data acquisition module 801, configured to acquire reference chinese field data from a client;

a preliminary root list construction module 802, configured to obtain a field length of each reference chinese field in the reference chinese field data, and construct a preliminary root list based on the field length, where the preliminary root list includes a plurality of reference roots and a chinese name of each reference root;

the target root list construction module 803 is configured to perform root translation on each reference root of the preliminary root list to obtain an english name of each reference root, and construct a target root list based on the english name and the preliminary root list;

a field obtaining module 804, configured to obtain target chinese field data to be processed from a client;

the field word segmentation module 805 is configured to perform field word segmentation on the target chinese field data to obtain a target phrase;

the phrase translation module 806 is configured to translate the target phrase based on the target root list to obtain a target text corresponding to the target chinese field data.

In some embodiments, the preliminary root list construction module 802 includes:

A field length obtaining unit for obtaining the field length of each reference Chinese field;

the length comparison unit is used for comparing the field length with a preset length threshold value;

the first root generation unit is used for taking the reference Chinese field as a first root if the field length is greater than the length threshold value;

the second root generation unit is used for carrying out symbol recognition on the reference Chinese field if the field length is smaller than or equal to the length threshold value, and carrying out word segmentation on the reference Chinese field according to the recognition result to obtain a plurality of second roots;

and the preliminary root list generating unit is used for constructing a preliminary root list according to the first root and the second root.

In some embodiments, the second root word generation unit comprises:

the symbol identification unit is used for identifying whether a target punctuation mark exists in the reference Chinese field if the field length is smaller than or equal to the length threshold value;

the first recognition unit is used for deleting the content of the reference Chinese field according to the target punctuation mark to obtain an intermediate Chinese field and segmenting the intermediate Chinese field to obtain a second root if the recognition result is that the target punctuation mark exists in the reference Chinese field;

And the second recognition unit is used for segmenting the reference Chinese field to obtain a second root if the recognition result is that the target punctuation mark does not exist in the reference Chinese field.

In some embodiments, the preliminary root list generation unit includes:

the root classification unit is used for classifying the first root and the second root according to a preset root category label to obtain a plurality of root sets;

and the list construction unit is used for constructing a preliminary root list based on the root class labels and the root set.

In some embodiments, the target root list construction module 803 includes:

the preliminary root list sending unit is used for sending the preliminary root list to the client so that the client can translate each reference root of the preliminary root list to obtain the English name of each reference root;

the English name acquisition unit is used for acquiring English names of each reference word root fed back by the client according to the preliminary word root list;

and the name embedding unit is used for embedding the English name into the preliminary root list to obtain a target root list.

In some embodiments, the text processing device further includes a collation module, specifically including:

the root acquiring unit is used for acquiring reference roots representing the same semantics in the target root list;

And the name calibration unit is used for performing name calibration on the reference roots representing the same semantics.

In some embodiments, phrase translation module 806 includes:

the phrase comparison unit is used for traversing the target root list, and comparing the target phrases with the reference roots of the target root list to obtain candidate root data;

the root integrating unit is used for integrating the candidate root data to obtain a target text;

and the text sending unit is used for sending the target text to the client.

The specific implementation of the text processing device is basically the same as the specific embodiment of the text processing method, and will not be repeated here.

The embodiment of the application also provides electronic equipment, which comprises: the text processing method comprises a memory, a processor, a program stored on the memory and capable of running on the processor, and a data bus for realizing connection communication between the processor and the memory, wherein the program is executed by the processor to realize the text processing method. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.

Referring to fig. 9, fig. 9 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:

The processor 901 may be implemented by a general purpose CPU (central processing unit), a microprocessor, an application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solution provided by the embodiments of the present application;

the memory 902 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM). The memory 902 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present disclosure are implemented by software or firmware, relevant program codes are stored in the memory 902, and the processor 901 invokes a text processing method for executing the embodiments of the present disclosure;

an input/output interface 903 for inputting and outputting information;

the communication interface 904 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);

A bus 905 that transfers information between the various components of the device (e.g., the processor 901, the memory 902, the input/output interface 903, and the communication interface 904);

wherein the processor 901, the memory 902, the input/output interface 903 and the communication interface 904 are communicatively coupled to each other within the device via a bus 905.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to realize the text processing method.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiment of the application provides a text processing method, a text processing device, electronic equipment and a computer readable storage medium, which are used for acquiring reference Chinese field data from a client; the method comprises the steps of obtaining the field length of each reference Chinese field in reference Chinese field data, and constructing a preliminary word root list based on the field length, wherein the preliminary word root list comprises a plurality of reference word roots and Chinese names of each reference word root, and different word segmentation modes can be selected to obtain the reference word roots according to different field lengths of the reference Chinese fields. When the field length is not greater than the length threshold value, identifying whether the supplementary content limited by target punctuation marks such as a colon, brackets and the like exists in the reference Chinese field, adaptively deleting the supplementary content, and segmenting words based on the key content in the reference Chinese field to form a second root, so that the accuracy of segmentation words and the semantic integrity of the reference root can be effectively improved, and the quality of the root of a root list constructed based on the second root can be improved. Further, root translation is performed on each reference root of the preliminary root list to obtain an English name of each reference root, and a target root list is constructed based on the English name and the preliminary root list, so that the target root list contains Chinese content information and English content information of the reference root, and the comprehensiveness of the root information is improved. Finally, obtaining target Chinese field data to be processed from the client; performing field word segmentation on the target Chinese field data to obtain a target phrase; and then, performing translation processing on the target phrase based on the target root list to obtain a target text corresponding to the target Chinese field data, performing text translation on the target Chinese field data needing translation processing based on the target root list, and generating a corresponding translation text, so that the same target root list can be used for various data development scenes, the translation time loss in the data development process can be reduced, and the data development efficiency is improved. Meanwhile, the text processing method of the embodiment of the application is not used for text translation based on experience of developers in the traditional technology, can enable the text translation to be more accurate, can effectively avoid the problems of low consistency of translation texts, large use limitation and the like caused by translation habits of different developers, can improve the conversation efficiency and conversation effectiveness of institutions such as banks, network merchants and the like and network objects, can enable institutions such as banks, network merchants and the like to conduct business guidance and commodity recommendation for the network objects based on the translation texts, is beneficial to improving the accuracy and service quality of commodity recommendation, and is beneficial to improving commodity transaction efficiency.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by those skilled in the art that the solutions shown in fig. 1-7 are not limiting on the embodiments of the application and may include more or fewer steps than shown, or certain steps may be combined, or different steps.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.

The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A method of text processing, the method comprising:

acquiring reference Chinese field data from a client;

acquiring target Chinese field data to be processed from the client;

2. The text processing method of claim 1, wherein the obtaining the field length of each reference chinese field in the reference chinese field data and constructing the preliminary root list based on the field lengths comprises:

Acquiring the field length of each reference Chinese field;

comparing the field length with a preset length threshold;

3. The text processing method according to claim 2, wherein if the field length is less than or equal to the length threshold, performing symbol recognition on the reference chinese field, and performing word segmentation on the reference chinese field according to a recognition result to obtain a plurality of second root words, including:

4. The text processing method of claim 2, wherein constructing the preliminary root list from the first root word and the second root word comprises:

5. The text processing method according to claim 1, wherein said performing root translation on each reference root of the preliminary root word list to obtain an english name of each reference root word, and constructing a target root word list based on the english name and the preliminary root word list includes:

6. The text processing method according to claim 1, wherein after said root translation is performed on each reference root of said preliminary root word list to obtain an english name of each of said reference root words, and a target root word list is constructed based on said english name and said preliminary root word list, said method further comprises:

7. The text processing method according to any one of claims 1 to 6, wherein the translating the target phrase based on the target root list to obtain a target text corresponding to the target chinese field data includes:

integrating the candidate root data to obtain the target text;

and sending the target text to the client.

8. A text processing apparatus, the apparatus comprising:

9. An electronic device comprising a memory storing a computer program and a processor implementing the text processing method according to any of claims 1 to 7 when the computer program is executed by the processor.

10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the text processing method of any one of claims 1 to 7.