CN111026319B - Intelligent text processing method and device, electronic equipment and storage medium - Google Patents

Intelligent text processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111026319B
CN111026319B CN201911362272.5A CN201911362272A CN111026319B CN 111026319 B CN111026319 B CN 111026319B CN 201911362272 A CN201911362272 A CN 201911362272A CN 111026319 B CN111026319 B CN 111026319B
Authority
CN
China
Prior art keywords
text
word
text content
matched
processing model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911362272.5A
Other languages
Chinese (zh)
Other versions
CN111026319A (en
Inventor
田植良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010194392.5A priority Critical patent/CN111414122B/en
Priority to CN201911362272.5A priority patent/CN111026319B/en
Publication of CN111026319A publication Critical patent/CN111026319A/en
Application granted granted Critical
Publication of CN111026319B publication Critical patent/CN111026319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures

Abstract

The invention provides an intelligent text processing method, which comprises the steps of obtaining text contents corresponding to selected operations in a touch screen; extracting a feature vector matched with the text content; determining at least one word-level hidden variable corresponding to the text content according to the feature vector; generating candidate words corresponding to the hidden variables of the word level and the selected probability of the candidate words according to the hidden variables of at least one word level; selecting at least one candidate word to form a target text corresponding to the text content according to the selection probability of the candidate word; and displaying the target text in a display mode corresponding to the selected operation in the touch screen. The invention also provides an intelligent text processing device, electronic equipment and a storage medium. The method and the device can realize that the target text to be selected by the user can be pre-judged in the process of selecting the text by the user through the touch screen, and the corresponding target text is output for the user to select.

Description

Intelligent text processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to information processing technologies, and in particular, to an intelligent text processing method and apparatus, an electronic device, and a storage medium.
Background
In the conventional technology, when text information is selected by using a touch screen electronic device (a mobile phone, an ipad, etc.), due to the limitation of the operable area of the touch screen and the habit of one-hand operation of a user, the user cannot select a text accurately by manually controlling a cursor, and often encounters the situation that an ideal text cannot be selected, so that the selection speed and the selection accuracy of the text by the user are affected. The artificial intelligence is the theory, method and technology for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, perceiving environment, acquiring knowledge and obtaining the best result by using the knowledge, and the artificial intelligence of an application system, namely, the artificial intelligence for researching the design principle and the implementation method of various intelligent machines, so that the machine has the functions of perception, reasoning and decision making, and in the field of voice processing, the recognition of text information is realized by using the digital computer or the machine controlled by the digital computer.
Disclosure of Invention
In view of this, an embodiment of the present invention provides an intelligent text processing method, an intelligent text processing apparatus, an electronic device, and a storage medium, and a technical solution of the embodiment of the present invention is implemented as follows:
the embodiment of the invention provides an intelligent text processing method, which comprises the following steps:
acquiring text content corresponding to the operation in the touch screen;
extracting a feature vector matched with the text content;
determining at least one word-level hidden variable corresponding to the text content according to the feature vector;
generating candidate words corresponding to the hidden variables of the word level and the selected probability of the candidate words according to the hidden variables of the at least one word level;
selecting at least one candidate word to form a target text corresponding to the text content according to the selection probability of the candidate word;
and displaying the target text in a display mode corresponding to the selected operation in the touch screen.
The embodiment of the invention also provides an intelligent text processing device, which comprises:
the information transmission module is used for acquiring text contents corresponding to the selection operation in the touch screen;
the information processing module is used for extracting a characteristic vector matched with the text content;
the information processing module is used for determining at least one word-level hidden variable corresponding to the text content according to the feature vector;
the information processing module is used for generating candidate words corresponding to the hidden variables of the word level and the selected probability of the candidate words according to the hidden variables of the at least one word level;
the information processing module is used for selecting at least one candidate word to form a target text corresponding to the text content according to the selection probability of the candidate word;
and the information processing module is used for displaying the target text in a display mode corresponding to the selected operation in the touch screen.
In the above-mentioned scheme, the first step of the method,
the information processing module is used for triggering the corresponding word segmentation libraries according to the text parameter information carried by the text content;
the information processing module is used for carrying out word segmentation processing on the text content through the triggered word segmentation library word dictionary to form different word level feature vectors;
and the information processing module is used for denoising the different word-level feature vectors to form a word-level feature vector set corresponding to the text content.
In the above-mentioned scheme, the first step of the method,
the information processing module is used for determining a dynamic noise threshold value matched with the use environment of the text processing model;
the information processing module is used for carrying out denoising processing on the different word-level feature vectors according to the dynamic noise threshold value and triggering a dynamic word segmentation strategy matched with the dynamic noise threshold value;
and the information processing module is used for performing word segmentation processing on the text content according to a dynamic word segmentation strategy matched with the dynamic noise threshold value to form a dynamic word level feature vector set corresponding to the text content.
In the above-mentioned scheme, the first step of the method,
the information processing module is used for determining a fixed noise threshold value corresponding to the use environment of the text processing model;
the information processing module is used for denoising the different word-level feature vectors according to the fixed noise threshold and triggering a fixed word segmentation strategy matched with the fixed noise threshold;
and the information processing module is used for performing word segmentation processing on the target text according to a fixed word segmentation strategy matched with the fixed noise threshold, and the fixed word level feature vector set corresponds to the text content.
In the above-mentioned scheme, the first step of the method,
the information processing module is used for performing word segmentation processing on the text content to form a word segmentation processing result;
the information processing module is used for responding to the word segmentation processing result and performing word deactivation processing on the text content to form text keywords matched with the text content;
and the information processing module is used for determining a part-of-speech tagging result matched with the text content according to the text keywords matched with the text content and forming a part-of-speech characteristic vector set corresponding to the text content.
In the above scheme, the apparatus further comprises:
the training module is used for acquiring a training sample matched with the use environment of the text processing model;
the training module is used for extracting a feature set matched with the training sample through the text processing model;
and the training module is used for training the text processing model according to the feature set matched with the training sample and the corresponding target text label so as to determine the model parameters matched with the text processing model.
In the above-mentioned scheme, the first step of the method,
the information processing module is used for detecting processing results of different users on the text content and corresponding operation parameters;
the information processing module is used for forming historical data indexes respectively corresponding to the different users according to the processing result of the text content and the corresponding operation parameters; wherein the historical data index is used for evaluating the target text generated by the text processing model.
In the above-mentioned scheme, the first step of the method,
the information processing module is used for sending the text content and the corresponding target text matched with the text content to a block chain network so as to ensure that the text content and the corresponding target text are sent to the block chain network
And filling the text content and the corresponding target text matched with the text content into a new block by the node of the block chain network, and when the new block is identified in a consistent manner, adding the new block to the tail part of the block chain.
An embodiment of the present invention further provides an electronic device, where the electronic device includes:
a memory for storing executable instructions;
and the processor is used for realizing the intelligent text processing method of the preamble when the executable instruction stored in the memory is operated.
The embodiment of the invention also provides a computer-readable storage medium, which stores executable instructions, and the executable instructions are executed by a processor to realize the intelligent text processing method of the preamble.
The embodiment of the invention has the following beneficial effects:
acquiring text content corresponding to selection operation in a touch screen; extracting a feature vector matched with the text content; determining at least one word-level hidden variable corresponding to the text content according to the feature vector; generating candidate words corresponding to the hidden variables of the word level and the selected probability of the candidate words according to the hidden variables of the at least one word level; selecting at least one candidate word to form a target text corresponding to the text content according to the selection probability of the candidate word; and displaying the target text in a display mode corresponding to the selection operation in the touch screen, so that the target text to be selected by the user can be pre-judged through the corresponding text processing model in the process of selecting the text by the user through the touch screen, and the corresponding target text is output for the user to select, so that the high-quality target text can be intelligently generated through the scheme of the invention, repeated selection operation caused by selection error of the user is reduced, and the use experience of the user is improved.
Drawings
Fig. 1 is a schematic view of a usage scenario of an intelligent text processing method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 3 is an optional schematic flow chart of the intelligent text processing method according to the embodiment of the present invention;
FIG. 4 is a diagram illustrating an alternative process of the text processing model in an embodiment of the present invention;
FIG. 5 is a schematic diagram of an alternative processing procedure of a text processing method according to an embodiment of the present invention;
fig. 6 is a schematic flow chart illustrating an alternative process of processing text information of a text processing model according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of the intelligent text processing apparatus 100 according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a block chain in the block chain network 200 according to an embodiment of the present invention;
fig. 9 is a functional architecture diagram of a blockchain network 200 according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating an application environment for text selection according to the related art in an embodiment of the present invention;
FIG. 11 is a diagram illustrating an application environment for text selection according to the related art in an embodiment of the present invention;
FIG. 12 is a diagram illustrating an application environment for text selection according to the related art in an embodiment of the present invention;
FIG. 13 is a diagram illustrating a working process of a text processing model according to an embodiment of the present invention;
FIG. 14A is a diagram illustrating a text selection of a text processing model according to an embodiment of the present invention;
FIG. 14B is a diagram illustrating a training process of a text processing model according to an embodiment of the present invention;
FIG. 15 is a diagram illustrating a data structure of a text processing model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.
1) In response to the condition or state on which the performed operation depends, one or more of the performed operations may be in real-time or may have a set delay when the dependent condition or state is satisfied; there is no restriction on the order of execution of the operations performed unless otherwise specified.
2) Word segmentation: also known as word segmentation, functions to segment the textual information of a complete sentence into a plurality of words, such as: liu XX is a Chinese singer. The result after word segmentation is: liu XX, China and singers.
3) A word bank is divided: the term segmentation library refers to a specific word segmentation method, and word dictionaries corresponding to different term segmentation libraries can be used for carrying out word segmentation processing on corresponding text information according to the word dictionaries corresponding to the term segmentation libraries.
4) Consistency: meaning that the data accessed in different server accesses is always unique.
5) Down-sampling process, sampling a sample sequence every several samples, so that the obtained new sequence is the down-sampling of the original sequence, for example: for an image I of size M × N, s-fold down-sampling is performed to obtain a resolution-divided image of size (M/s) × (N/s), where s should be a common divisor of M and N.
6) And (4) model training, namely performing multi-classification learning on the image data set. The model can be constructed by adopting deep learning frames such as TensorFlow, torch and the like, and a multi-classification model is formed by combining multiple layers of neural network layers such as CNN and the like. The input of the model is a three-channel or original channel matrix formed by reading an image through openCV and other tools, the output of the model is multi-classification probability, and the webpage category is finally output through softmax and other algorithms. During training, the model approaches to a correct trend through an objective function such as cross entropy and the like.
7) Transactions (transactions), equivalent to the computer term "Transaction," include operations that need to be committed to a blockchain network for execution and do not refer solely to transactions in the context of commerce, which embodiments of the present invention follow in view of the convention colloquially used in blockchain technology.
8) A Block chain (Blockchain) is a storage structure for encrypted, chained transactions formed from blocks (blocks).
9) A Blockchain Network (Blockchain Network) incorporates new blocks into a set of nodes of a Blockchain in a consensus manner.
10) Ledger (legger) is a general term for blockchains (also called Ledger data) and state databases synchronized with blockchains.
11) Intelligent Contracts (Smart Contracts), also known as chain codes (chaincodes) or application codes, are programs deployed in nodes of a blockchain network, and the nodes execute the intelligent Contracts called in received transactions to perform operations of updating or querying key-value data of a state database.
12) Consensus (Consensus), a process in a blockchain network, is used to agree on transactions in a block among a plurality of nodes involved, the agreed block is to be appended to the end of the blockchain, and the mechanisms for achieving Consensus include Proof of workload (PoW, Proof of Work), Proof of rights and interests (PoS, Proof of equity (DPoS), Proof of granted of shares (DPoS), Proof of Elapsed Time (PoET, Proof of Elapsed Time), and so on.
13) Convolutional Neural Networks (CNN Convolutional Neural Networks) are a class of Feed forward Neural Networks (Feed forward Neural Networks) that contain convolution computations and have a deep structure, and are one of the representative algorithms for deep learning (deep learning). The convolutional neural network has a representation learning (representation learning) capability, and can perform shift-invariant classification (shift-invariant classification) on input information according to a hierarchical structure of the convolutional neural network.
Fig. 1 is a schematic view of a usage scenario of an intelligent text processing method according to an embodiment of the present invention, referring to fig. 1, a terminal (including a terminal 10-1 and a terminal 10-2) is provided with corresponding clients capable of executing different functions, where the clients are terminals (including the terminal 10-1 and the terminal 10-2) that acquire different text information from corresponding servers 200 through a network 300 for browsing, the terminal is connected to the servers 200 through the network 300, the network 300 may be a wide area network or a local area network, or a combination of the two, and data transmission is implemented using a wireless link, where types of text information acquired by the terminals (including the terminal 10-1 and the terminal 10-2) from the corresponding servers 200 through the network 300 are different, for example: the terminal (including the terminal 10-1 and the terminal 10-2) can acquire any type of text information from the corresponding server 200 through the network 300, and can also acquire text information only matched with the corresponding retrieval instruction from the corresponding server 200 through the network 300 for browsing. The server 200 may store text information or corresponding inverted indexes for performing word segmentation processing through different word segmentation libraries. In some embodiments of the invention, different types of textual information maintained in server 200 may be written in software code environments of different programming languages, and code objects may be different types of code entities. For example, in the software code of C language, one code object may be one function. In the software code of JAVA language, a code object may be a class, and the OC language of IOS terminal may be a target code. In the software code of C + + language, a code object may be a class or a function to execute text processing instructions from different terminals. In which the sources of the text information to be processed by the text processing model are not further distinguished in the present application.
The server 200 needs to determine the text information selected by the user for monitoring in the process that the server 200 transmits the different types of text information to the terminal (the terminal 10-1 and/or the terminal 10-2) through the network 300. As an example, the server 200 is configured to obtain text content corresponding to a selection operation in the touch screen; extracting a feature vector matched with the text content; determining at least one word-level hidden variable corresponding to the text content according to the feature vector through the text processing model; generating candidate words corresponding to the hidden variables of the word level and the selected probability of the candidate words according to the hidden variables of the at least one word level through the text processing model; selecting at least one candidate word to form a target text corresponding to the text content according to the selection probability of the candidate word; and displaying the target text in a display mode corresponding to the selected operation in the touch screen, thereby realizing the output of the target text, realizing the segmentation of the text content into different target texts matched with the user operation and facilitating the execution of subsequent different operations by the user.
As will be described in detail below, the electronic device according to the embodiment of the present invention may be implemented in various forms, such as a dedicated terminal with a text processing function, or an electronic device with a text processing function, for example, the server 200 in fig. 1. Fig. 2 is a schematic diagram of a composition structure of an electronic device according to an embodiment of the present invention, and it is understood that fig. 2 only shows an exemplary structure of the electronic device, and not a whole structure, and a part of the structure or the whole structure shown in fig. 2 may be implemented as needed.
The electronic equipment provided by the embodiment of the invention comprises: at least one processor 201, memory 202, user interface 203, and at least one network interface 204. The various components in the electronic device 20 are coupled together by a bus system 205. It will be appreciated that the bus system 205 is used to enable communications among the components. The bus system 205 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 205 in fig. 2.
The user interface 203 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen.
It will be appreciated that the memory 202 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The memory 202 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 10-1). Examples of such data include: any computer program, such as an operating system and application programs, for operating on a terminal (e.g., 10-1). The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program may include various application programs.
In some embodiments, the intelligent text processing apparatus provided in the embodiments of the present invention may be implemented by a combination of hardware and software, and by way of example, the intelligent text processing apparatus provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the intelligent text processing method provided in the embodiments of the present invention. For example, a processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
As an example of the intelligent text processing apparatus provided by the embodiment of the present invention implemented by combining software and hardware, the intelligent text processing apparatus provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 201, where the software modules may be located in a storage medium, the storage medium is located in the memory 202, and the processor 201 reads executable instructions included in the software modules in the memory 202, and completes the intelligent text processing method provided by the embodiment of the present invention in combination with necessary hardware (for example, including the processor 201 and other components connected to the bus 205).
By way of example, the Processor 201 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor or the like.
As an example of the intelligent text processing apparatus provided by the embodiment of the present invention implemented by hardware, the apparatus provided by the embodiment of the present invention may be implemented by directly using the processor 201 in the form of a hardware decoding processor, for example, by being executed by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components, to implement the intelligent text processing method provided by the embodiment of the present invention.
The memory 202 in embodiments of the present invention is used to store various types of data to support the operation of the electronic device 20. Examples of such data include: any executable instructions for operating on the electronic device 20, such as executable instructions, may be included in the executable instructions, as may the program implementing the smart text processing method of embodiments of the present invention.
In other embodiments, the smart text processing apparatus provided by the embodiment of the present invention may be implemented by software, and fig. 2 shows the smart text processing apparatus 2020 stored in the memory 202, which may be software in the form of programs, plug-ins, and the like, and includes a series of modules, and as an example of the programs stored in the memory 202, the smart text processing apparatus 2020 may include the following software modules: an information transmission module 2081 and an information processing module 2082. When the software modules in the intelligent text processing apparatus 2020 are read into the RAM by the processor 201 and executed, the functions of the software modules in the intelligent text processing apparatus 2020 are described as follows:
the information transmission module 2081 is used for acquiring text contents corresponding to selection operations in the touch screen;
the information processing module 2082 is used for extracting a feature vector matched with the text content;
the information processing module 2082 is configured to determine, according to the feature vector, at least one word-level hidden variable corresponding to text content through the text processing model;
the information processing module 2082 is configured to generate, according to the at least one word-level hidden variable, a candidate word corresponding to the word-level hidden variable and a selected probability of the candidate word through the text processing model;
the information processing module 2082 is configured to select at least one candidate word to form a target text corresponding to the text content according to the selection probability of the candidate word;
the information processing module 2082 is configured to display the target text in a display manner corresponding to the selected operation on the touch screen.
Referring to fig. 3, fig. 3 is an optional flowchart of the intelligent text processing method provided in the embodiment of the present invention, and it can be understood that the steps shown in fig. 3 may be executed by various electronic devices operating the intelligent text processing apparatus, for example, a dedicated terminal, an electronic device, or an electronic device cluster having a retrieval instruction processing function. The following is a description of the steps shown in fig. 3.
Step 301: the intelligent text processing device acquires text content corresponding to the selected operation in the touch screen.
When a user selects text information by using a touch screen electronic device (a mobile phone, an ipad and the like), due to the limitation of the operable area of the touch screen and the habit of single-hand operation of the user, the user cannot accurately select a text by manually controlling a cursor, and the situation that an ideal text cannot be selected is often encountered, so that the selection speed and the selection accuracy of the user on the text are influenced. Therefore, the text processing model encapsulated in the intelligent text processing device can be used for acquiring the text content (long text) corresponding to the selection operation in the touch screen to generate a corresponding new target text (short text) for the user to select, and repeated text selection operation caused by selection errors of the user is avoided.
Step 302: and the intelligent text processing device extracts the feature vector matched with the text content.
In some embodiments of the present invention, extracting the feature vector matching the text content may be implemented by:
triggering a corresponding word segmentation library according to the text parameter information carried by the text content; performing word segmentation processing on the text content through the triggered word segmentation library word dictionary to form different word level feature vectors; and denoising the different word-level feature vectors to form a word-level feature vector set corresponding to the text content. Wherein, the word segmentation means that the meaning of verb also means the meaning of name word; each participle is a word or a phrase, namely the minimum semantic unit with definite meaning; for the received use environments of different users or different text processing models, the minimum semantic units contained in the received use environments need to be divided into different types, and adjustment needs to be made timely, and the process is called word segmentation, namely the word segmentation can refer to the process for dividing the minimum semantic units; on the other hand, the minimum semantic unit obtained after division is also often called word segmentation, that is, a word obtained after the word segmentation is performed; in order to distinguish the two meanings from each other, the smallest semantic unit referred to by the latter meaning is sometimes referred to as a participle object (Term); the term participled object is used in this application; the word segmentation object corresponds to a keyword which is used as an index basis in the inverted list. For Chinese, because words as the minimum semantic unit are often composed of different numbers of characters, and there are no natural distinguishing marks in alphabetic writing such as blank partitions and the like between the words, it is an important step for Chinese to accurately perform word segmentation to obtain reasonable word segmentation objects.
In conjunction with the foregoing step 301, different terminal devices (for example, the terminal 10-1 and/or the terminal 10-2 shown in the foregoing fig. 1) may provide text information for a user to read or use on respective corresponding touch screen display interfaces (for example, display interfaces of a web page, a dedicated APP, and a WeChat applet), and the user may process the displayed text information through the touch screen of the electronic device to select text information to be used, when the terminal device detects the click operation (or selection operation) on the text, the server is triggered to start a corresponding word segmentation instruction, the word segmentation instruction carries text parameter information carried by the text content to trigger a word segmentation library matched with the text content, the server receives the word segmentation instruction to execute corresponding operations to form different word-level feature vectors. Or when the terminal device displays different text information on a touch screen display interface, and when the click operation on the text information is detected, the terminal device sends the word segmentation instruction to the server, wherein the word segmentation instruction carries the word segmentation library matched with the current user (the word segmentation library carries a corresponding user identifier), and the server receives the word segmentation instruction to execute corresponding operation to form different word-level feature vectors. It should be noted that the embodiment of the present invention does not limit the triggering manner of the word segmentation instruction.
In some embodiments of the present invention, the language habits and the operation habits of different users are different, and different word segmentation methods need to be adjusted for different users to adapt to the language habits of different users. Especially for Chinese, the meaning unit is expressed based on Chinese characters, and the minimum semantic unit which really has a meaning is a word; because the space between words is not used as the segmentation like the space between English words, which words form words in a sentence of text is uncertain, and therefore, the word segmentation of Chinese texts is an important work. Moreover, for the text processing instruction text which contains things which are only valuable for natural language understanding, for the text processing model, it is necessary to determine which are really valuable search bases for searching relevant contents, so that a word-level feature vector set corresponding to the text processing instruction text can be formed by performing denoising processing on different word-level feature vectors as shown in step 302, and the occurrence of meaningless word-level feature vectors such as "yes", "ground" and "get" in the word-level feature vector set is avoided.
In some embodiments of the present invention, denoising the different word-level feature vectors to form a word-level feature vector set corresponding to the text content may be implemented by:
determining a dynamic noise threshold value matched with the use environment of the text processing model; denoising the different word-level feature vectors according to the dynamic noise threshold, and triggering a dynamic word segmentation strategy matched with the dynamic noise threshold; and performing word segmentation processing on the text content according to a dynamic word segmentation strategy matched with the dynamic noise threshold value to form a dynamic word level feature vector set corresponding to the text content. Wherein, due to different use environments of the text processing model, the dynamic noise threshold matched with the use environment of the text processing model is different, for example, in the use environment of academic translation, the dynamic noise threshold matched with the use environment of the text processing model and only including the text information of the academic paper displayed by the terminal needs to be smaller than the dynamic noise threshold in the reading environment of the entertainment information text.
In some embodiments of the present invention, denoising the different word-level feature vectors to form a word-level feature vector set corresponding to the text content may be implemented by:
determining a fixed noise threshold corresponding to a usage environment of the text processing model;
denoising the different word-level feature vectors according to the fixed noise threshold, and triggering a fixed word segmentation strategy matched with the fixed noise threshold; and performing word segmentation processing on the target text according to a fixed word segmentation strategy matched with the fixed noise threshold, wherein a fixed word level feature vector set corresponding to the text content is obtained. When the text processing model is solidified in a corresponding hardware mechanism, such as a vehicle-mounted terminal or an intelligent medical system, and the using environment is professional term text information (or text information in a certain field), because the noise is relatively single, the processing speed of the text processing model can be effectively improved, the waiting time of a user is reduced, and the using experience of the user is improved through fixing a fixed noise threshold corresponding to the text processing model.
In some embodiments of the present invention, extracting the feature vector matching the text content may be implemented by:
performing word segmentation processing on the text content to form a word segmentation processing result; responding to the word segmentation processing result, and performing word deactivation processing on the text content to form text keywords matched with the text content; and determining a part-of-speech tagging result matched with the text content according to the text keywords matched with the text content, and forming a part-of-speech feature vector set corresponding to the text content. Because the text processed by the text processing model not only includes text information in a single language, but also may be complex text information in multiple languages (for example, a chinese-english hybrid academic paper as text information), in which, unlike english which directly uses spaces as intervals between words, for a chinese text, word segmentation is correspondingly required, because words in chinese can contain complete information. Correspondingly, a Chinese word segmentation tool Jieba can be used for segmenting Chinese texts. In addition, word processing needs to be stopped for the segmented keyword set correspondingly, and because words like "yes" and "can" have no information help for corresponding category labeling tasks. For example, for the text "yes, i like doing experiments", segmenting words, and stopping words to obtain a set consisting of two keywords "like/doing experiments" (using/as separators, the same below), thereby effectively improving the processing speed of the text processing model.
Step 303: and the intelligent text processing device determines at least one word-level hidden variable corresponding to the text content according to the feature vector through the text processing model.
Step 304: and the intelligent text processing device generates candidate words corresponding to the hidden variables of the word level and the selected probability of the candidate words according to the hidden variables of the at least one word level through the text processing model.
Step 305: and the intelligent text processing device selects at least one candidate word to form a target text corresponding to the text content according to the selection probability of the candidate word.
It should be noted that the global part of speech of the text information matched with the text content in the present invention refers to each part of speech corresponding to the natural language description information of the video to be described, the global part of speech vector is a vector of each part of speech combination, and the global part of speech vector feature is a feature of a vector of each part of speech combination. Wherein, the part of speech is an attribute of a word, a phrase or a word, and various languages can be defined with various parts of speech. By way of example, Chinese includes, but is not limited to, parts of speech such as nouns, verbs, adjectives, quantifiers, adverbs, prepositions, and the like; english includes, but is not limited to, noun, verb, gerund, adjective, adverb, article, preposition, etc.; other types of parts of speech may also be included in other languages, and are not described in detail herein. The part-of-speech vector is relative to a sentence described in natural language, usually the sentence is composed of two or more words, and the part-of-speech vector feature is a combination of part-of-speech features of each word in the sentence.
Referring to fig. 4, fig. 4 is a schematic diagram of an optional processing procedure of the text processing model in the embodiment of the present invention, where the encoder may include a convolutional neural network, and after inputting the feature vector set into the encoder, outputs a corresponding floating point feature vector corresponding to the feature vector set. In particular, the amount of the solvent to be used,the feature vector set is input to an encoder, that is, a convolutional neural network in the encoder, corresponding floating point feature vectors corresponding to the feature vector set are extracted through the convolutional neural network, the convolutional neural network outputs the extracted corresponding floating point feature vectors and uses the extracted corresponding floating point feature vectors as the output of the encoder, and then corresponding text information processing is executed by using the floating point feature vectors output by the encoder, or the encoder may include a convolutional neural network and a circular neural network, and after the feature vector set is input to the encoder, corresponding floating point feature vectors carrying timing information corresponding to the feature vector set are output, as shown in the encoder in fig. 4. Specifically, the feature vector set is input to the encoder, i.e., a convolutional neural network (e.g., CNN neural network in fig. 4) in the encoder, a corresponding floating point feature vector corresponding to the feature vector set is extracted by the convolutional neural network, the convolutional neural network outputs the extracted corresponding floating point feature vector, and the cyclic neural network (corresponding to h in fig. 4) in the encoder is inputi-1、hiAnd the like) to extract and fuse the timing information of the extracted convolutional neural network characteristic vector through a recurrent neural network, wherein the recurrent neural network outputs a floating point characteristic vector carrying the timing information and serves as the output of an encoder, and then corresponding processing steps are executed by using the floating point characteristic vector output by the encoder.
Referring to fig. 5, fig. 5 is a schematic diagram of an optional processing procedure of the text processing method according to the embodiment of the present invention, where the dual-flow long-short term memory network may include a bidirectional vector model, an attention model, a full-link layer, and a sigmoid classifier, the bidirectional vector model performs recursion processing on different feature vectors in a feature vector set of input text content, and combines the feature vectors after the recursion processing together to form a longer vector, for example, combining part-of-speech feature vectors together to form a longer vector, and combines two combined vectors together again to form a longer vector, and finally uses two full-link layers to map learned distributed feature representations to corresponding sample label spaces to improve accuracy of a final classification result, and finally uses the sigmoid classifier to determine probability values of the text content corresponding to respective labels, and integrating the target text to form new text information corresponding to the text content information.
Wherein, the batch processing parameter (batch size) of the convolutional neural network model is selected to be 32 or 64, the initial learning rate of the optimizer of the convolutional neural network model selecting adaptive optimizer (adam) is selected to be 0.0001, and the random inactivation (dropout) is selected to be 0.3. After 10000 times of iterative training, the accuracy of the training set and the accuracy of the test set are stable at more than 98%, which shows that the model is matched with the task scene, and can obtain more ideal training effect and fix all parameters of the convolutional neural network model in the state.
Step 306: and displaying the target text in a display mode corresponding to the selected operation in the touch screen.
Therefore, the intelligent text processing device outputs the target text, and divides the text content into different target texts to be matched with the operation of the user so as to facilitate the subsequent operation of the user.
Referring to fig. 6, fig. 6 is an optional flowchart illustrating a text information processing method of a text processing model according to an embodiment of the present invention, where fig. 6 is an exemplary flowchart illustrating text information processing of the text processing model according to an embodiment of the present invention, and it can be understood that the steps shown in fig. 6 may be executed by various electronic devices of a text information processing apparatus running the text processing model, for example, a dedicated terminal, a server or a server cluster with a text information processing function of the text processing model is used for training the text information processing model to determine model parameters adapted to the text processing model, and specifically includes the following steps:
step 601: and the server acquires a training sample matched with the use environment of the text processing model.
In some embodiments of the present invention, obtaining training samples may be accomplished by:
detecting processing results of different users on the text content and corresponding operation parameters; forming historical data indexes respectively corresponding to the different users according to the processing result of the text content and the corresponding operation parameters; wherein the historical data index is used for evaluating the target text generated by the text processing model. Because the language habits and the operation requirements of different users are different, the processing results of the different users on the text content and the corresponding operation parameters can be detected, so that not only can a training sample for a certain user be obtained, but also the training samples of the different users can be fused to obtain a universal training sample set, and the universality of the text processing model can be trained.
Step 602: and the server extracts a characteristic set matched with the training sample through the text processing model.
Step 603: and the server trains the text processing model according to the feature set matched with the training sample and the corresponding target text label so as to determine model parameters matched with the text processing model.
In some embodiments of the invention, the method further comprises:
and sending the text content and the corresponding target text matched with the text content to a block chain network, so that nodes of the block chain network fill the text content and the corresponding target text matched with the text content into a new block, and when the new blocks are identified in common, adding the new block to the tail of a block chain.
Referring to fig. 7, fig. 7 is a schematic structural diagram of the intelligent text processing apparatus 100 according to an embodiment of the present invention, which includes a blockchain network 200 (exemplarily illustrating the consensus node 210-1 to the consensus node 210-3), an authentication center 300, a service agent 400, and a service agent 500, which are respectively described below.
The type of blockchain network 200 is flexible and may be, for example, any of a public chain, a private chain, or a federation chain. Taking a public link as an example, electronic devices such as user terminals and servers of any service entity can access the blockchain network 200 without authorization; taking a federation chain as an example, an electronic device (e.g., a terminal/server) under the jurisdiction of a service entity after obtaining authorization may access the blockchain network 200, and at this time, become a client node in the blockchain network 200.
In some embodiments, the client node may act as a mere watcher of the blockchain network 200, i.e., provides functionality to support a business entity to initiate a transaction (e.g., for uplink storage of data or querying of data on a chain), and may be implemented by default or selectively (e.g., depending on the specific business requirements of the business entity) with respect to the functions of the consensus node 210 of the blockchain network 200, such as a ranking function, a consensus service, and an accounting function, etc. Therefore, the data and the service processing logic of the service subject can be migrated into the block chain network 200 to the maximum extent, and the credibility and traceability of the data and service processing process are realized through the block chain network 200.
Consensus nodes in blockchain network 200 receive transactions submitted from client nodes (e.g., client node 410 attributed to business entity 400, and client node 510 attributed to business entity 500, shown in fig. 7) of different business entities (e.g., business entity 400 and business entity 500, shown in fig. 7), perform the transactions to update the ledger or query the ledger, and various intermediate or final results of performing the transactions may be returned for display in the business entity's client nodes.
For example, the client node 410/510 may subscribe to events of interest in the blockchain network 200, such as transactions occurring in a particular organization/channel in the blockchain network 200, and the corresponding transaction notifications are pushed by the consensus node 210 to the client node 410/510, thereby triggering the corresponding business logic in the client node 410/510.
An exemplary application of the blockchain network is described below, taking an example in which a plurality of service entities access the blockchain network to implement management and processing of text information.
Referring to fig. 7, a plurality of business entities involved in the management link, such as the business entity 400 may be an intelligent text processing device based on artificial intelligence, the business entity 500 may be a display system with a text display (operation) function, and registers from the certificate authority 300 to obtain respective digital certificates, where the digital certificates include the public key of the business entity and the digital signature signed by the certificate authority 300 on the public key and identity information of the business entity, and are used to be attached to the transaction together with the digital signature of the business entity for the transaction, and are sent to the blockchain network, so that the blockchain network takes out the digital certificate and the signature from the transaction, verifies the authenticity of the message (i.e. whether the message is not tampered) and the identity information of the business entity sending the message, and verifies the blockchain network according to the identity, for example, whether the blockchain network has the right to initiate the transaction. Clients running on electronic devices (e.g., terminals or servers) hosted by the business entity may request access from the blockchain network 200 to become client nodes.
The client node 410 of the service body 400 is configured to obtain text content corresponding to a selection operation in the touch screen; extracting a feature vector matched with the text content; determining at least one word-level hidden variable corresponding to the text content according to the feature vector through the text processing model; generating candidate words corresponding to the hidden variables of the word level and the selected probability of the candidate words according to the hidden variables of the at least one word level through the text processing model; selecting at least one candidate word to form a target text corresponding to the text content according to the selection probability of the candidate word; and displaying the target text in a display mode corresponding to the selected operation in the touch screen to realize the output of the target text, dividing the text content into different target texts to be matched with the operation of the user, and sending the text content and the corresponding target text matched with the text content to the blockchain network 200.
The text content and the generated target text are sent to the blockchain network 200, service logic may be set in the client node 410 in advance, when corresponding text information is formed, the client node 410 automatically sends the text content and the corresponding target text matched with the text content to the blockchain network 200, or a service person of the service agent 400 logs in the client node 410, manually packages the text content and the generated target text, and sends the text content and the generated target text to the blockchain network 200. Upon transmission, the client node 410 generates a transaction corresponding to the update operation based on the text content and the corresponding target text matching the text content, specifies in the transaction the smart contract that needs to be invoked to implement the update operation, and the parameters passed to the smart contract, and the transaction also carries the digital certificate of the client node 410, a signed digital signature (e.g., obtained by encrypting a digest of the transaction using a private key in the digital certificate of the client node 410), and broadcasts the transaction to the consensus node 210 in the blockchain network 200.
When the transaction is received in the consensus node 210 in the blockchain network 200, the digital certificate and the digital signature carried by the transaction are verified, after the verification is successful, whether the service agent 400 has the transaction right is determined according to the identity of the service agent 400 carried in the transaction, and the transaction fails due to any verification judgment of the digital signature and the right verification. After successful verification, node 210 signs its own digital signature (e.g., by encrypting the digest of the transaction using the private key of node 210-1) and continues to broadcast in blockchain network 200.
After receiving the transaction successfully verified, the consensus node 210 in the blockchain network 200 fills the transaction into a new block and broadcasts the new block. When a new block is broadcasted by the consensus node 210 in the block chain network 200, performing a consensus process on the new block, if the consensus is successful, adding the new block to the tail of the block chain stored in the new block, updating the state database according to a transaction result, and executing a transaction in the new block: and adding a key value pair comprising the text content and the corresponding target text matched with the text content into the state database for the transaction of submitting the updated text content and the corresponding target text matched with the text content.
A service person of the service agent 500 logs in the client node 510, inputs a text content or text information query request, the client node 510 generates a transaction corresponding to an update operation/query operation according to the text content or text information query request, specifies an intelligent contract that needs to be called to implement the update operation/query operation and parameters transferred to the intelligent contract in the transaction, and broadcasts the transaction to the consensus node 210 in the blockchain network 200, where the transaction also carries a digital certificate of the client node 510 and a signed digital signature (for example, a digest of the transaction is encrypted by using a private key in the digital certificate of the client node 510).
After receiving the transaction in the consensus node 210 in the blockchain network 200, verifying the transaction, filling the block and making the consensus consistent, adding the filled new block to the tail of the blockchain stored in the new block, updating the state database according to the transaction result, and executing the transaction in the new block: updating a key value pair corresponding to a certain text content in a state database according to different target texts for submitted transactions for updating the text content of the certain text and the corresponding target text matched with the text content; and for the submitted transaction for inquiring certain text content, inquiring the key value pair corresponding to the text content from the state database, and returning a transaction result.
It should be noted that fig. 7 exemplarily shows a process of linking the text content directly with the generated target text, but in other embodiments, for a case where the data size of the text content is large, the client node 410 may link the hash of the text content and the corresponding hash of the text information in pairs, and store the original text content and the corresponding target text information in a distributed file system or a database. After the client node 510 obtains the text content and the corresponding target text information from the distributed file system or the database, it may perform verification by combining with the corresponding hash in the blockchain network 200, thereby reducing the workload of uplink operation.
As an example of a block chain, referring to fig. 8, fig. 8 is a schematic structural diagram of a block chain in a block chain network 200 according to an embodiment of the present invention, where a header of each block may include hash values of all transactions in the block and also include hash values of all transactions in a previous block, a record of a newly generated transaction is filled in the block and is added to a tail of the block chain after being identified by nodes in the block chain network, so as to form a chain growth, and a chain structure based on hash values between blocks ensures tamper resistance and forgery prevention of transactions in the block. The text content stored in the blockchain network may be a dedicated text in a certain field (for example, case information of a medical system or experimental information data text in a scientific experiment), and sharing of the text content among different nodes may be achieved by storing the text content in the blockchain network.
An exemplary functional architecture of a block chain network provided in the embodiment of the present invention is described below, referring to fig. 9, fig. 9 is a functional architecture schematic diagram of a block chain network 200 provided in the embodiment of the present invention, which includes an application layer 201, a consensus layer 202, a network layer 203, a data layer 204, and a resource layer 205, which are described below respectively.
The resource layer 205 encapsulates the computing, storage, and communication resources that implement each node 210 in the blockchain network 200.
The data layer 204 encapsulates various data structures that implement the ledger, including blockchains implemented in files in a file system, state databases of the key-value type, and presence certificates (e.g., hash trees of transactions in blocks).
The network layer 203 encapsulates the functions of a Point-to-Point (P2P) network protocol, a data propagation mechanism and a data verification mechanism, an access authentication mechanism and service agent identity management.
Wherein the P2P network protocol implements communication between nodes 210 in the blockchain network 200, the data propagation mechanism ensures propagation of transactions in the blockchain network 200, and the data verification mechanism implements reliability of data transmission between nodes 210 based on cryptography methods (e.g., digital certificates, digital signatures, public/private key pairs); the access authentication mechanism is used for authenticating the identity of the service subject added into the block chain network 200 according to an actual service scene, and endowing the service subject with the authority of accessing the block chain network 200 when the authentication is passed; the business entity identity management is used to store the identity of the business entity that is allowed to access blockchain network 200, as well as the permissions (e.g., the types of transactions that can be initiated).
The consensus layer 202 encapsulates the functions of the mechanism for the nodes 210 in the blockchain network 200 to agree on a block (i.e., a consensus mechanism), transaction management, and ledger management. The consensus mechanism comprises consensus algorithms such as POS, POW and DPOS, and the pluggable consensus algorithm is supported.
The transaction management is configured to verify a digital signature carried in the transaction received by the node 210, verify identity information of the service entity, and determine whether the node has an authority to perform the transaction (read related information from the identity management of the service entity) according to the identity information; for the service agents authorized to access the blockchain network 200, the service agents all have digital certificates issued by the certificate authority, and the service agents sign the submitted transactions by using private keys in the digital certificates of the service agents, so that the legal identities of the service agents are declared.
The ledger administration is used to maintain blockchains and state databases. For the block with the consensus, adding the block to the tail of the block chain; executing the transaction in the acquired consensus block, updating the key-value pairs in the state database when the transaction comprises an update operation, querying the key-value pairs in the state database when the transaction comprises a query operation and returning a query result to the client node of the business entity. Supporting query operations for multiple dimensions of a state database, comprising: querying the block based on the block vector number (e.g., hash value of the transaction); inquiring the block according to the block hash value; inquiring a block according to the transaction vector number; inquiring the transaction according to the transaction vector number; inquiring account data of a business main body according to an account (vector number) of the business main body; and inquiring the block chain in the channel according to the channel name.
The application layer 201 encapsulates various services that the blockchain network can implement, including tracing, crediting, and verifying transactions.
The following describes an intelligent text processing method provided by an embodiment of the present invention by taking text information processing in a wechat applet as an example, where fig. 10 is an application environment schematic diagram of text selection in the related art in the embodiment of the present invention, where in the related art, as shown in fig. 10, a text may be selected only in a touch screen sensing manner, specifically, in a using process of the scheme, a terminal receives a signal touched by a finger through a touch screen, locates a cursor according to a touch position, and a user may select a boundary of a corresponding text by using the cursor. In the process, the text can be selected only according to the position where the touch screen senses the finger touch, the target text is inconvenient to select when the text font is small, and meanwhile, because the user often easily selects the wrong text or selects the text characters in a missing mode through one-hand operation, the user can select the target text for many times, and the operation time of the user is wasted.
Further, fig. 11 is a schematic diagram of an application environment for text selection in the related art in the embodiment of the present invention, where in the related art, as shown in fig. 11, a text may be selected in a touch screen sensing manner, and a punctuation mark is considered, specifically, on the basis of the scheme shown in the foregoing fig. 10, when text information of a text to be selected is determined, a common character and the punctuation mark may be distinguished, because the punctuation mark is a boundary of a sentence or a half-sentence, when a cursor of a user moves to a vicinity of the punctuation mark, the system may use the punctuation mark as a boundary of a target text with a higher probability, but in this process, although the punctuation mark is considered in text selection, this process may only use the punctuation mark to perform auxiliary selection, and for a text in a sentence that the user wants to select, for example, a word in the sentence shown in fig. 11 cannot perform auxiliary selection. Meanwhile, personalized recommendation cannot be performed according to the historical habits of the users (for example, the users need to select texts including punctuations, and some users do not need to include texts of the punctuations), so that the personalized recommendation is not beneficial to different users.
Further, fig. 12 is a schematic diagram of an application environment for text selection in the related art in the embodiment of the present invention, and as shown in fig. 12, in the related art, a corresponding target text is selected in a touch screen sensing manner, and word segmentation processing is performed on the target text in consideration of word segmentation information. However, in this process, only word information can be used for auxiliary selection, and a longer phrase or short sentence cannot be selected, and meanwhile, this process cannot perform personalized recommendation according to the historical habits of the users (different users and different fields have different word segmentation processing methods).
Fig. 13 is a schematic diagram of a working process of the text processing model according to the embodiment of the present invention, which specifically includes the following steps:
step 1301: the server acquires a long text sentence to be processed displayed in the applet;
the acquired long text sentence can correspond to the position touched by the finger in the touch screen.
Step 1302: at least one word-level hidden variable corresponding to the text content is determined by an encoder of the text processing model.
Of course, the text processing model needs to be trained before the text processing model processes the text content to determine the corresponding network parameters.
With continuing reference to fig. 14A and 14B, fig. 14A is a schematic diagram of text selection of a text processing model provided in the embodiment of the present invention, and fig. 14B is a schematic diagram of a training process of the text processing model provided in the embodiment of the present invention, which specifically includes the following steps:
step 1401: training samples matched with WeChat applet users are obtained.
In order to obtain the training sample, user history data needs to be recorded, and specifically, usage data of all users (for different text contents) may be recorded. First, the user may provide one or more touches to the screen to select text, and then select the target text followed by further operations (e.g., copy, send, etc.). The text selected by the user before the next operation can be regarded as the real target text of the user. Meanwhile, the position of the finger when the user touches the screen is recorded. Of course, to distinguish between different users, the users are configured with an ID (not repeated from each other) when recording the training data, and each sample indicates the user ID.
Step 1402: and extracting various characteristics of the training sample through a characteristic extractor, and inputting the characteristics into a text processing model.
The extracted features comprise word segmentation (the ID is used for indicating that the word change belongs to the number of words), part of speech tagging, word vectors, entity recognition and semantic role analysis (tagged main/predicate/object). Fig. 15 is a schematic diagram of a data structure of the text processing model in the embodiment of the present invention, each feature is extracted by a feature extractor, and the extracted features can be represented in a floating point vector form.
Step 1403: and inputting the floating point vector into a recurrent neural network model, and predicting a final result through the recurrent neural network.
In this process, the encoder parameters and decoder parameters of the text processing model are iteratively updated until a corresponding convergence function is reached.
Step 1404: and finishing the training of the text processing model.
In some embodiments of the present invention, the text processing model provided by the present invention may include two sub-models, respectively: the method comprises a global user text processing model and a current user text processing model, wherein data of all users can be used as training samples for training the global user text processing model, and only data of the current user is used as the training samples in the training of the current user text processing model so as to improve the pertinence of the model.
Step 1303: generating, by a decoder of the text processing model, a candidate word corresponding to the word-level hidden variable and a selected probability of the candidate word according to the at least one word-level hidden variable;
step 1304: and selecting at least one candidate word to form a processing result corresponding to the text content according to the selection probability of the candidate word.
Step 1305: and outputting short text sentences or words to a display interface of the WeChat small program to predict the target text to be selected by the user.
Therefore, compared with the traditional technology for processing the text in the touch screen, by the technical scheme provided by the application, the text processing model is triggered by detecting the operation of the user on the touch screen, the prediction of the text to be selected by the user is realized, the accuracy of user selection is improved, the fluency of user use is improved, and the use experience of the user is effectively improved.
The beneficial technical effects are as follows:
acquiring text content corresponding to selection operation in a touch screen; extracting a feature vector matched with the text content; determining at least one word-level hidden variable corresponding to the text content according to the feature vector through the text processing model; generating candidate words corresponding to the hidden variables of the word level and the selected probability of the candidate words according to the hidden variables of the at least one word level through the text processing model; selecting at least one candidate word to form a target text corresponding to the text content according to the selection probability of the candidate word; and displaying the target text in a display mode corresponding to the selection operation in the touch screen, so that in the process of selecting the text by the user by using the touch screen, the target text to be selected by the user can be pre-judged through the corresponding text processing model, and the corresponding target text is output for the user to select, so that the text processing model can generate the high-quality target text, repeated selection operations caused by selection errors of the user are reduced, and the use experience of the user is improved.
The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (14)

1. An intelligent text processing method, characterized in that the method comprises:
acquiring text content corresponding to the operation in the touch screen;
extracting a feature vector matched with the text content;
triggering a trained text processing model, and determining at least one word-level hidden variable corresponding to the text content according to the feature vector through the text processing model, wherein the text processing model comprises: the method comprises the steps that a global user text processing model and a current user text processing model are adopted, data of all users are used as training samples when the global user text processing model is trained, and data of the current user are used as the training samples when the current user text processing model is trained, so that a target text generated by the text processing model is evaluated through corresponding historical data indexes in the training stage of the text processing model;
generating candidate words corresponding to the hidden variables of the word level and the selected probability of the candidate words according to the hidden variables of the at least one word level through the text processing model;
selecting at least one candidate word to form a target text corresponding to the text content according to the selection probability of the candidate word through the text processing model;
and displaying the target text in a display mode corresponding to the selected operation in the touch screen.
2. The method of claim 1, wherein the extracting the feature vector matching the text content comprises:
triggering a corresponding word segmentation library according to the text parameter information carried by the text content;
performing word segmentation processing on the text content through the triggered word segmentation library word dictionary to form different word level feature vectors;
and denoising the different word-level feature vectors to form a word-level feature vector set corresponding to the text content.
3. The method of claim 2, wherein the denoising the different word-level feature vectors to form a set of word-level feature vectors corresponding to the text content comprises:
determining a dynamic noise threshold value matched with the use environment of the text processing model;
denoising the different word-level feature vectors according to the dynamic noise threshold, and triggering a dynamic word segmentation strategy matched with the dynamic noise threshold;
and performing word segmentation processing on the text content according to a dynamic word segmentation strategy matched with the dynamic noise threshold value to form a dynamic word level feature vector set corresponding to the text content.
4. The method of claim 2, wherein the denoising the different word-level feature vectors to form a set of word-level feature vectors corresponding to the text content comprises:
determining a fixed noise threshold corresponding to a use environment of the text processing model;
denoising the different word-level feature vectors according to the fixed noise threshold, and triggering a fixed word segmentation strategy matched with the fixed noise threshold;
and performing word segmentation processing on the target text according to a fixed word segmentation strategy matched with the fixed noise threshold, wherein a fixed word level feature vector set corresponding to the text content is obtained.
5. The method of claim 1, wherein the extracting the feature vector matching the text content comprises:
performing word segmentation processing on the text content to form a word segmentation processing result;
responding to the word segmentation processing result, and performing word deactivation processing on the text content to form text keywords matched with the text content;
and determining a part-of-speech tagging result matched with the text content according to the text keywords matched with the text content, and forming a part-of-speech feature vector set corresponding to the text content.
6. The method of claim 1, further comprising:
acquiring a training sample matched with the use environment of the text processing model;
extracting a feature set matched with the training sample through the text processing model;
and training the text processing model according to the feature set matched with the training sample and the corresponding target text label so as to determine model parameters matched with the text processing model.
7. The method according to any one of claims 1 to 6, further comprising:
sending the text content and the corresponding target text matched with the text content to a block chain network so as to enable the text content and the corresponding target text to be matched with the text content to be sent to the block chain network
And filling the text content and the corresponding target text matched with the text content into a new block by the node of the block chain network, and when the new block is identified in a consistent manner, adding the new block to the tail part of the block chain.
8. An intelligent text processing apparatus, characterized in that the apparatus comprises:
the information transmission module is used for acquiring text contents corresponding to the selection operation in the touch screen;
the information processing module is used for extracting a characteristic vector matched with the text content;
the information processing module is configured to trigger a trained text processing model, and determine, according to the feature vector, at least one word-level hidden variable corresponding to the text content through the text processing model, where the text processing model includes: the method comprises the steps that a global user text processing model and a current user text processing model are adopted, data of all users are used as training samples when the global user text processing model is trained, and data of the current user are used as the training samples when the current user text processing model is trained, so that a target text generated by the text processing model is evaluated through corresponding historical data indexes in the training stage of the text processing model;
the information processing module is used for generating candidate words corresponding to the hidden variables of the word level and the selected probability of the candidate words according to the hidden variables of the at least one word level through the text processing model;
the information processing module is used for selecting at least one candidate word to form a target text corresponding to the text content according to the selection probability of the candidate word through the text processing model;
and the information processing module is used for displaying the target text in a display mode corresponding to the selected operation in the touch screen.
9. The apparatus of claim 8,
the information processing module is used for triggering the corresponding word segmentation libraries according to the text parameter information carried by the text content;
the information processing module is used for carrying out word segmentation processing on the text content through the triggered word segmentation library word dictionary to form different word level feature vectors;
and the information processing module is used for denoising the different word-level feature vectors to form a word-level feature vector set corresponding to the text content.
10. The apparatus of claim 9,
the information processing module is used for determining a dynamic noise threshold value matched with the use environment of the text processing model;
the information processing module is used for carrying out denoising processing on the different word-level feature vectors according to the dynamic noise threshold value and triggering a dynamic word segmentation strategy matched with the dynamic noise threshold value;
and the information processing module is used for performing word segmentation processing on the text content according to a dynamic word segmentation strategy matched with the dynamic noise threshold value to form a dynamic word level feature vector set corresponding to the text content.
11. The apparatus of claim 9,
the information processing module is used for determining a fixed noise threshold value corresponding to the use environment of the text processing model;
the information processing module is used for denoising the different word-level feature vectors according to the fixed noise threshold and triggering a fixed word segmentation strategy matched with the fixed noise threshold;
and the information processing module is used for performing word segmentation processing on the target text according to a fixed word segmentation strategy matched with the fixed noise threshold, and the fixed word level feature vector set corresponds to the text content.
12. The apparatus of claim 8,
the information processing module is used for performing word segmentation processing on the text content to form a word segmentation processing result;
the information processing module is used for responding to the word segmentation processing result and performing word deactivation processing on the text content to form text keywords matched with the text content;
and the information processing module is used for determining a part-of-speech tagging result matched with the text content according to the text keywords matched with the text content and forming a part-of-speech characteristic vector set corresponding to the text content.
13. An electronic device, characterized in that the electronic device comprises:
a memory for storing executable instructions;
a processor for implementing the intelligent text processing method of any one of claims 1 to 7 when executing the executable instructions stored by the memory.
14. A computer-readable storage medium storing executable instructions, wherein the executable instructions, when executed by a processor, implement the intelligent text processing method of any one of claims 1 to 7.
CN201911362272.5A 2019-12-26 2019-12-26 Intelligent text processing method and device, electronic equipment and storage medium Active CN111026319B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010194392.5A CN111414122B (en) 2019-12-26 2019-12-26 Intelligent text processing method and device, electronic equipment and storage medium
CN201911362272.5A CN111026319B (en) 2019-12-26 2019-12-26 Intelligent text processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911362272.5A CN111026319B (en) 2019-12-26 2019-12-26 Intelligent text processing method and device, electronic equipment and storage medium

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202010194392.5A Division CN111414122B (en) 2019-12-26 2019-12-26 Intelligent text processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111026319A CN111026319A (en) 2020-04-17
CN111026319B true CN111026319B (en) 2021-12-10

Family

ID=70213573

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010194392.5A Active CN111414122B (en) 2019-12-26 2019-12-26 Intelligent text processing method and device, electronic equipment and storage medium
CN201911362272.5A Active CN111026319B (en) 2019-12-26 2019-12-26 Intelligent text processing method and device, electronic equipment and storage medium

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202010194392.5A Active CN111414122B (en) 2019-12-26 2019-12-26 Intelligent text processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (2) CN111414122B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552797B (en) * 2020-04-30 2021-06-22 腾讯科技(深圳)有限公司 Name prediction model training method and device, electronic equipment and storage medium
CN111552890B (en) * 2020-04-30 2021-05-18 腾讯科技(深圳)有限公司 Name information processing method and device based on name prediction model and electronic equipment
CN111723567A (en) * 2020-05-20 2020-09-29 支付宝(杭州)信息技术有限公司 Text selection data processing method, device and equipment
US11068908B1 (en) * 2020-12-22 2021-07-20 Lucas GC Limited Skill-based credential verification by a credential vault system (CVS)
CN114757180A (en) * 2020-12-26 2022-07-15 华为技术有限公司 Method for selecting text, electronic equipment and computer readable storage medium
CN113743089A (en) * 2021-09-03 2021-12-03 科大讯飞股份有限公司 Multilingual text generation method, device, equipment and storage medium
CN113887235A (en) * 2021-09-24 2022-01-04 北京三快在线科技有限公司 Information recommendation method and device

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102937864B (en) * 2012-10-31 2015-11-25 百度在线网络技术(北京)有限公司 A kind of method and apparatus for determining selected text on touch terminal
CN106156163B (en) * 2015-04-15 2021-06-22 株式会社日立制作所 Text classification method and device
CN105824552B (en) * 2015-07-29 2019-05-17 维沃移动通信有限公司 A kind of recognition methods of text information and device
CN105426528B (en) * 2015-12-15 2018-04-06 中南大学 A kind of retrieval ordering method and system of commodity data
CN107220220A (en) * 2016-03-22 2017-09-29 索尼公司 Electronic equipment and method for text-processing
CN108874832B (en) * 2017-05-15 2022-06-10 腾讯科技(深圳)有限公司 Target comment determination method and device
CN107578270A (en) * 2017-08-03 2018-01-12 中国银联股份有限公司 A kind of construction method, device and the computing device of financial label
CN108563624A (en) * 2018-01-03 2018-09-21 清华大学深圳研究生院 A kind of spatial term method based on deep learning
CN108388554B (en) * 2018-01-04 2021-09-28 中国科学院自动化研究所 Text emotion recognition system based on collaborative filtering attention mechanism
CN110032324B (en) * 2018-01-11 2024-03-05 荣耀终端有限公司 Text selection method and terminal
CN108363753B (en) * 2018-01-30 2020-05-19 南京邮电大学 Comment text emotion classification model training and emotion classification method, device and equipment
CN109062937B (en) * 2018-06-15 2019-11-26 北京百度网讯科技有限公司 The method of training description text generation model, the method and device for generating description text
CN110032730B (en) * 2019-02-18 2023-09-05 创新先进技术有限公司 Text data processing method, device and equipment
CN110377881B (en) * 2019-06-11 2023-04-07 创新先进技术有限公司 Integration method, device and system of text processing service
CN110457714B (en) * 2019-06-25 2021-04-06 西安电子科技大学 Natural language generation method based on time sequence topic model
CN110413738A (en) * 2019-07-31 2019-11-05 腾讯科技(深圳)有限公司 A kind of information processing method, device, server and storage medium
CN110597961B (en) * 2019-09-18 2023-10-27 腾讯云计算(北京)有限责任公司 Text category labeling method and device, electronic equipment and storage medium
CN110598224A (en) * 2019-09-23 2019-12-20 腾讯科技(深圳)有限公司 Translation model training method, text processing device and storage medium

Also Published As

Publication number Publication date
CN111414122B (en) 2021-06-11
CN111026319A (en) 2020-04-17
CN111414122A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN111026319B (en) Intelligent text processing method and device, electronic equipment and storage medium
CN111026320B (en) Multi-mode intelligent text processing method and device, electronic equipment and storage medium
CN111552799B (en) Information processing method, information processing device, electronic equipment and storage medium
US9318027B2 (en) Caching natural language questions and results in a question and answer system
CN111026858B (en) Project information processing method and device based on project recommendation model
CN112437917A (en) Natural language interface for databases using autonomous agents and thesaurus
CN108959559B (en) Question and answer pair generation method and device
US20190205396A1 (en) Method and system of translating a source sentence in a first language into a target sentence in a second language
US11861319B2 (en) Chatbot conducting a virtual social dialogue
US11966389B2 (en) Natural language to structured query generation via paraphrasing
CN111552797B (en) Name prediction model training method and device, electronic equipment and storage medium
US11194963B1 (en) Auditing citations in a textual document
CN110597963A (en) Expression question-answer library construction method, expression search method, device and storage medium
US11144560B2 (en) Utilizing unsumbitted user input data for improved task performance
CN114429133A (en) Relying on speech analysis to answer complex questions through neuro-machine reading understanding
CN111552798B (en) Name information processing method and device based on name prediction model and electronic equipment
CN110162771A (en) The recognition methods of event trigger word, device, electronic equipment
CN112651236B (en) Method and device for extracting text information, computer equipment and storage medium
WO2020149959A1 (en) Conversion of natural language query
CN111142728B (en) Vehicle-mounted environment intelligent text processing method and device, electronic equipment and storage medium
WO2022141872A1 (en) Document abstract generation method and apparatus, computer device, and storage medium
US11120064B2 (en) Transliteration of data records for improved data matching
CN111552890B (en) Name information processing method and device based on name prediction model and electronic equipment
US20170024405A1 (en) Method for automatically generating dynamic index for content displayed on electronic device
US11675822B2 (en) Computer generated data analysis and learning to derive multimedia factoids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40022207

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant