CN114579701A - Information processing method, system, device, electronic equipment and storage medium - Google Patents

Information processing method, system, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114579701A
CN114579701A CN202210176449.8A CN202210176449A CN114579701A CN 114579701 A CN114579701 A CN 114579701A CN 202210176449 A CN202210176449 A CN 202210176449A CN 114579701 A CN114579701 A CN 114579701A
Authority
CN
China
Prior art keywords
target
text object
word
candidate
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210176449.8A
Other languages
Chinese (zh)
Inventor
姜典转
崔力娟
田甘迅
高建
林荣逸
王丛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210176449.8A priority Critical patent/CN114579701A/en
Publication of CN114579701A publication Critical patent/CN114579701A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Abstract

The disclosure provides an information processing method, a system, a device, an electronic device and a storage medium, and relates to the field of big data processing and the like. The specific implementation scheme is as follows: acquiring initial entity words in the text object; responding to the fact that the initial entity words have matching information, taking the initial entity words as candidate words, and taking matching contents as display contents associated with the candidate words, wherein the matching contents correspond to the matching information; and updating the text object based on the display content associated with the candidate words, and saving the updated text object as a processed text object. The embodiment of the disclosure can check the content associated with the word in the processed text object more efficiently and conveniently when the processed text object is displayed, and improves the overall efficiency.

Description

Information processing method, system, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technology, and more particularly, to the field of big data processing technology.
Background
With the development of computer technology, the data volume is rapidly increased, and a user can timely acquire and view a large number of articles according to the self demand. However, how to enable a user to more quickly acquire or view related information of a word in an article when viewing a certain article becomes a problem to be solved.
Disclosure of Invention
The disclosure provides an information processing method, system, device, electronic device and storage medium.
According to a first aspect of the present disclosure, there is provided an information processing method including:
acquiring initial entity words in the text object;
responding to the fact that the initial entity words have matching information, taking the initial entity words as candidate words, and taking matching contents as display contents associated with the candidate words, wherein the matching contents correspond to the matching information;
and updating the text object based on the display content associated with the candidate words, and saving the updated text object as a processed text object.
According to a second aspect of the present disclosure, there is provided an information processing system including:
the first server is used for acquiring initial entity words in the text object; responding to the fact that the initial entity words have matching information, taking the initial entity words as candidate words, and taking matching contents as display contents associated with the candidate words, wherein the matching contents correspond to the matching information; and updating the text object based on the display content associated with the candidate words, and saving the updated text object as a processed text object.
According to a third aspect of the present disclosure, there is provided an information processing apparatus comprising:
the initial processing module is used for acquiring initial entity words in the text object;
the content processing module is used for responding to the matching information of the initial entity word, taking the initial entity word as a candidate word and taking the matching content as the display content associated with the candidate word, wherein the matching content corresponds to the matching information;
the updating module is used for updating the text object based on the display content associated with the candidate words;
and the storage module is used for storing the updated text object as a processed text object.
According to a fourth aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
According to a fifth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the aforementioned method.
According to a sixth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the aforementioned method.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
According to the scheme provided by the embodiment, the initial entity words in the text object can be obtained, and then under the condition that the initial entity has matching information, the initial entity words are used as candidate words, the display contents of the candidate words are determined, the text object is updated based on the display contents associated with the candidate words, and the updated text object is stored as a processed text object; therefore, when the processed text object is displayed, the content related to the words in the processed text object can be checked more efficiently and conveniently, and the overall efficiency is improved.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a first flowchart illustrating an information processing method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a training flow of a target model according to an embodiment of the present disclosure;
FIG. 3 is a second flowchart illustrating an information processing method according to an embodiment of the present disclosure;
FIG. 4 is a first block diagram of an information handling system according to an embodiment of the present disclosure;
FIG. 5 is a block diagram of an information handling system according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of a component structure of an information processing apparatus according to another embodiment of the present disclosure;
fig. 7 is a schematic view of another composition structure of an information processing apparatus according to another embodiment of the present disclosure;
fig. 8 is a block diagram of an electronic device for implementing the retrieval method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
An embodiment of a first aspect of the present disclosure provides an information processing method, as shown in fig. 1, including:
s101: acquiring initial entity words in the text object;
s102: responding to the fact that the initial entity words have matching information, taking the initial entity words as candidate words, and taking matching contents as display contents associated with the candidate words, wherein the matching contents correspond to the matching information;
s103: and updating the text object based on the display content associated with the candidate words, and saving the updated text object as a processed text object.
The information processing method provided by the embodiment can be applied to the first server.
The text object may be any one of a plurality of candidate text objects stored at the first server side, and it should be understood that, for each of the plurality of candidate text objects stored at the first server side, each candidate text object may be used as the text object, and is processed by using the information processing method provided in this embodiment, which is not described in detail herein. The text object can be specifically an article to be processed; the candidate textual objects may specifically refer to candidate articles.
The obtaining of the initial entity word in the text object may include: matching the text object based on a preset word bank, and taking preset word entity words in the text object and the preset word bank as the initial entity words; or processing the text object based on a target model to obtain the initial entity words contained in the text object.
The preset word stock can be preset according to actual conditions; the word types contained in the preset word stock can also be set according to actual requirements, for example, the word types can be set as medical words or medical words and the like.
The target model may be a pre-trained model, the input of the target model may be the text object, and the output of the target model may include the initial entity words contained in the text object.
After the initial entity word in the text object is obtained, in response to that there is matching information for the initial entity word, the initial entity word may be used as a candidate word, and matching content may be used as display content associated with the candidate word, where the matching content corresponds to the matching information.
Specifically, the following may be mentioned: judging whether the initial entity words have matching information in a preset information base or not; and under the condition that the initial entity word has matching information in a preset information base, taking the initial entity word as a candidate word, and taking matching content as display content associated with the candidate word, wherein the matching content corresponds to the matching information.
In addition, the method may further include: and under the condition that the initial entity word does not have matching information in a preset information base, determining that the initial entity word is not a candidate word.
The preset information base may be pre-stored in the first server. The preset information base can comprise preset key information and preset content related to the preset key information; the number of the preset key information may be one or more, and the preset content associated with each preset key information may be one or more, where the number is not limited herein. The preset information library may be a KV (key-value) database; correspondingly, the preset key information and the preset content related to the preset key information may be KV pairs.
In the solution provided in this embodiment, the preset information base may specifically be a preset information base in the medical (or medical) field. The preset key information in the preset information base may include: the concept of entity knowledge in medical treatment, such as at least one of synonyms, hypernyms and hyponyms of symptoms, diseases, medicines and the like. The preset content corresponding to the preset key information in the preset information base may represent a department corresponding to a certain disease (or symptom), a mall corresponding to a certain medicine, or the like, that is, the concept normalization included in the preset key information may be corresponding to a standard mall medicine base, a standard disease base, or the like, which is not exhaustive in this embodiment.
The matching information is one of preset key information in the preset information base; correspondingly, the matching content corresponds to the matching information, namely the matching content is the preset content associated with the matching information.
It should be understood that the number of the candidate words included in the text object may be one or more, and the number is not limited in this embodiment.
The display content corresponding to the candidate word may be the matching content.
By adopting the scheme, the initial entity words in the text object are obtained, and then under the condition that the initial entity has matching information, the initial entity words are used as candidate words, the display contents of the candidate words are determined, and the text object is updated based on the display contents associated with the candidate words and is stored as a processed text object; therefore, when the processed text object is displayed, the content related to the words in the processed text object can be checked more efficiently and conveniently, and the overall efficiency is improved.
In one embodiment, the obtaining the initial entity word in the text object may include:
acquiring clauses contained in the text object;
inputting the clauses into a target model to obtain a clause recognition result output by the target model;
and under the condition that the type of the entity words contained in the sentence dividing identification result is a preset type, taking the entity words as the initial entity words, and determining the positions of the initial entity words in the text object based on the sentence dividing position information of the entity words contained in the sentence dividing identification result.
The obtaining of the clause included in the text object may include: carrying out segmentation processing on the text object to obtain at least one paragraph contained in the text object; and processing the at least one paragraph to obtain a plurality of clauses contained in the at least one paragraph.
The inputting the clause into a target model to obtain a clause recognition result output by the target model may specifically include: and respectively inputting the clauses into the target model to obtain the clause identification results respectively corresponding to the clauses and respectively output by the target model.
In the sentence identification results corresponding to the multiple sentences, the sentence identification result corresponding to each sentence may include: the type of the entity word, the sentence position information of the entity word and the entity word.
Inputting the clauses into a target model, and obtaining a clause recognition result output by the target model, the method may include:
judging whether the type of an entity word contained in an ith clause recognition result corresponding to the ith clause in the plurality of clauses is a preset type or not; i is an integer of 1 or more;
and under the condition that the type of the entity word contained in the ith clause recognition result is a preset type, taking the entity word as the initial entity word.
In addition, the method can also comprise the following steps: and under the condition that the type of the entity word contained in the ith clause recognition result is not a preset type, not taking the entity word as the initial entity word.
The preset type may be set according to actual situations, and may be, for example, a medical type, and the like, which are not exhaustive here.
The determining the position of the initial entity word in the text object based on the sentence division position information of the entity word included in the sentence division recognition result may include:
determining the relative position of the initial entity word in a jth paragraph based on the sentence position information of the entity word contained in the ith sentence recognition result and the jth paragraph in which the ith sentence is located; j is an integer of 1 or more;
determining a position of the initial entity word in the text object based on the relative position of the jth paragraph in the text object and the relative position of the initial entity word in the jth paragraph.
The determining, based on the sentence position information of the entity word included in the ith sentence recognition result and the jth paragraph in which the ith sentence is located, the relative position of the initial entity word in the jth paragraph may include: and determining the starting position and the length of the initial entity word in the jth paragraph based on the starting position of the ith clause in the jth paragraph and the starting position and the length of the entity word in the ith clause contained in the ith clause recognition result.
The sentence dividing position information of the entity word refers to the position of the entity word in the ith sentence; the sentence division position information of the entity word may specifically include: the starting position and the length of the entity word in the ith clause.
For example, the ith clause is an a1 th clause, and the clause position information of a certain entity word includes: the beginning position of the entity word in the A1 th clause is B1 th word and the length is C1. The starting position and the length of the ith clause in the jth paragraph are as follows: the A1 th clause is the B2 th word with the length of C2 at the beginning of the j paragraph. It may be further determined that the starting position and length of the initial entity word in the jth paragraph may be: the B2+ B1 words in the jth paragraph are C1 in length.
The determining the position of the initial entity word in the text object based on the relative position of the jth paragraph in the text object and the relative position of the initial entity word in the jth paragraph may specifically include:
determining a starting position and a length of the initial entity word in the text object based on the starting position of the jth paragraph in the text object and the starting position and the length of the initial entity word in the jth paragraph.
For example, the starting position and length of the initial entity word in the jth paragraph may be: the B2+ B1 words in the jth paragraph are C1 in length. The beginning position of the jth paragraph in the text object is the B3 th word. The starting position and length of the initial entity word in the text object may be: the B3 th + B2 th + B1 th word is C1 in length.
Therefore, by adopting the scheme, the clauses in the text object can be identified based on the target model to obtain the clause identification result, and then the entity words are used as the initial entity words under the condition that the entity words contained in the clause identification result are of the preset type. Therefore, sentence recognition can be carried out through the model, the problems of easy information loss, wrong analysis and the like caused by analyzing a long text can be avoided, and the recognition accuracy and the recognition efficiency are ensured. And, after the initial physical words are determined based on the sentence-dividing recognition result, the initial physical words are restored to the absolute positions of the text objects, which can ensure that the position information provided to the words displayed on the online side is trusted.
In one embodiment, the using the matching content as the presentation content associated with the candidate word includes:
and under the condition that at least one of the matching content and the candidate word is determined to meet a preset condition, taking the matching content as display content associated with the candidate word in the text object, and setting the candidate word in the text object to be in a first display state.
The method specifically comprises the following steps:
judging whether a preset condition is met or not based on at least one of the matching content and the candidate word; under the condition that the preset condition is determined to be met, taking the matching content as display content associated with the candidate words in the text object, and setting the candidate words in the text object to be in a first display state; and under the condition that the preset condition is determined not to be met, not taking the matching content as display content associated with the candidate words in the text object, and setting the candidate words in the text object not to be in a first display state.
The setting of the candidate word in the text object is not in a first display state, which may specifically mean setting the candidate word in the text object to maintain a second display state.
The first display state may refer to displaying in a preset color. The second presentation state may refer to presentation in a normal color. Wherein the second presentation state may further represent a presentation state (or color) of an article other than the article determined as the candidate word in the text object. The preset color is different from the normal color. The preset color may be set according to actual conditions, for example, may be red, and may also be other colors; the normal color may be black, etc., but may also be other colors; as long as the preset color is different from the normal color, the preset color and the normal color are within the protection scope of the present embodiment.
Therefore, by adopting the scheme, the matching content can be used as the display content associated with the candidate word in the text object under the condition that at least one of the matching content and the candidate word is determined to meet the preset condition; therefore, the problem that the required candidate words cannot be looked up as soon as possible due to the fact that more candidate words of the associated display content exist in the text object can be avoided, and follow-up efficiency is improved.
In one embodiment, the specific process of determining whether the preset condition is met based on at least one of the matching content and the candidate word may include at least one of:
determining that the preset condition is met under the condition that the position of the candidate word in the text object is a preset position;
determining that the preset condition is met under the condition that other candidate words which are the same as the candidate words exist in the text object and the positions of the candidate words in the text object are before the other candidate words;
and determining that the preset condition is met under the condition that the matching content contains the candidate link and the residual quantity of the candidate objects associated with the candidate link is greater than a preset quantity threshold value.
Wherein the preset position may include other positions besides the following positions: the first-level title of the text object (namely the article to be processed), the abstract of the text object (namely the article to be processed) and the matching picture of the text object (namely the article to be processed). That is to say, when the position of the candidate word in the text object (i.e., the article to be processed) is any one of a first-level title of the text object (i.e., the article to be processed), an abstract of the text object (i.e., the article to be processed), and a match of the text object (i.e., the article to be processed), it is determined that the preset condition is not satisfied; otherwise, the preset condition is determined to be met.
The position of the candidate word in the text object before the other candidate words may refer to: the position of the candidate word in the text object is most advanced compared to the positions of the other candidate words in the text object that are the same as the candidate word. That is, the candidate word appears first in the text object compared to the other candidate words that are the same as the candidate word.
I.e. there is a time limit for the candidate words contained in the text object and other candidate words that are identical to it. The reason for this is: when the same candidate word (or called a candidate entity word) appears in one text object for multiple times, a problem that more of the same candidate word and its associated display content appear in the same text object may occur. By setting the preset condition, subsequent setting can be performed only for the candidate word appearing for the first time or having the most advanced position in the text object, so that the problem can be avoided.
The other candidate words that are the same as the candidate word may include: other candidate words having the same text as the candidate word, and/or other candidate words having the same meaning as the candidate word.
The other candidate words having the same meaning as the candidate word may be at least one of: the alias of the candidate word, the synonym of the candidate word, the superior word of the candidate word and the inferior word of the candidate word.
For example, the text object (i.e., the article to be processed) includes the candidate word-1 and other candidate words-2 with the same characters as the candidate word-1, and the position of the candidate word-1 in the text object (i.e., the article to be processed) is before the other candidate word-2, that is, the candidate word-1 appears for the first time in the text object (i.e., the article to be processed), then the candidate word-1 is a candidate word that satisfies the preset condition; correspondingly, the other candidate word-2 is a candidate word which does not meet the preset condition. That is, when a certain candidate word appears for the first time, the first presentation state is set (for example, set to red), so that excessive candidate words in the first presentation state in the page of the whole text object are avoided.
For another example, the text object (i.e., the article to be processed) includes the candidate word-1 and another candidate word-3 having the same meaning as the candidate word-1, where the another candidate word-3 may be an alias of the candidate word-1, and the position of the candidate word-1 in the text object (i.e., the article to be processed) is before the another candidate word-3, that is, the candidate word-1 appears for the first time in the text object (i.e., the article to be processed), then the candidate word-1 is a candidate word that satisfies the preset condition; correspondingly, the other candidate word-3 is a candidate word which does not meet the preset condition. That is, a candidate word (or called entity word) is referred to in the article by different calling methods (i.e. there are other candidate words with the same meaning), and only the first-appearing candidate word is set in the first-appearing state (for example, set in red), so as to avoid too many candidate words in the first-appearing state in the entire page of the text object.
The candidate objects may specifically refer to a certain medicine or other type of object, etc., which are not exhaustive here. That is, the inventory (i.e., the remaining quantity) of the candidate object may also be limited, taking the candidate object as a medicine as an example, whether the medicine has enough inventory (i.e., the remaining quantity) is queried, and only when the inventory (i.e., the remaining quantity) is greater than a preset quantity threshold value, it is determined that a preset condition is met, and then the display content associated with the candidate word may be determined, and the candidate word is set in the first display state (i.e., set in red).
The preset number threshold may be set according to practical situations, and may be, for example, 100, 10, or more or less, and is not limited thereto.
It should be understood that the above three processes for determining whether the preset condition is satisfied may be used alone, any two of them may be used in combination, or all three of them may also be used.
For example, the specific process of determining whether a preset condition is satisfied based on at least one of the matching content and the candidate word may include:
under the condition that the position of the candidate word in the text object is a preset position, judging whether other candidate words same as the candidate word exist in the text object or not; determining that the preset condition is satisfied in a case that there is another candidate word that is the same as the candidate word in the text object and a position of the candidate word in the text object is before the another candidate word.
For another example, the specific process of determining whether the preset condition is met based on at least one of the matching content and the candidate word may include:
under the condition that the position of the candidate word in the text object is a preset position, judging whether the matching content contains a candidate link and whether the residual quantity of the candidate object associated with the candidate link is greater than a preset quantity threshold value; and determining that the preset condition is met under the condition that the matching content contains the candidate link and the residual quantity of the candidate objects associated with the candidate link is greater than a preset quantity threshold value.
For another example, the specific process of determining whether the preset condition is met based on at least one of the matching content and the candidate word may include:
under the condition that other candidate words which are the same as the candidate words exist in the text object and the positions of the candidate words in the text object are before the other candidate words, whether the matching content contains candidate links or not and whether the residual quantity of the candidate objects associated with the candidate links is larger than a preset quantity threshold value or not are judged; and determining that the preset condition is met under the condition that the matching content contains the candidate link and the residual quantity of the candidate objects associated with the candidate link is greater than a preset quantity threshold value.
For another example, the specific process of determining whether the preset condition is met based on at least one of the matching content and the candidate word may include:
under the condition that the position of the candidate word in the text object is a preset position, judging whether other candidate words same as the candidate word exist in the text object or not; under the condition that other candidate words which are the same as the candidate words exist in the text object and the positions of the candidate words in the text object are before the other candidate words, whether the matching content contains candidate links or not and whether the residual quantity of the candidate objects associated with the candidate links is larger than a preset quantity threshold value or not are judged; and determining that the preset condition is met under the condition that the matching content contains the candidate link and the residual quantity of the candidate objects associated with the candidate link is greater than a preset quantity threshold value.
It should be noted that, the above two processes used in combination to determine whether the preset condition is met are merely exemplary descriptions, and the determination sequence of any two processes used in combination in actual processes may be the same as or different from the above exemplary descriptions; similarly, the three processes used to determine whether the preset condition is satisfied are merely exemplary, and the determination sequence when the three processes are used in the actual process may be the same as or different from the above exemplary description.
Therefore, by adopting the scheme, whether the preset condition is met or not can be judged by combining at least one of the candidate words and the matching content, so that the setting of the candidate words in the text object is more reasonable and clearer, and the use efficiency of a subsequent client is ensured.
In one embodiment, as shown in fig. 2, the method further comprises:
s201: training a first preset model based on a first training sample to obtain a first model; the first training sample is a training sample in a first sample set;
s202: training a second preset model based on a second training sample and the first model to obtain a trained second preset model, and taking the trained second preset model as the target model; the second training sample is a training sample in a second sample set; the number of training samples in the second set of samples is greater than the number of training samples in the first set of samples.
Wherein the first training sample and the second training sample are from different sample sets.
The first training sample is a training sample in the first sample set, the number of the first training samples is not limited in this embodiment, and a plurality of first training samples may be used to train the first preset model.
Any training sample contained in the first sample set and the second sample set may be clause data provided with labeled entity words, such as a starting position and a length of a medical entity word labeled in the clause data.
The number of training samples contained in the first sample set is different from that of the second sample set, and the number of training samples contained in the first sample set is less than that of the training samples in the second sample set.
The first preset model and the second preset model have different structures or architectures, and the second preset model can be a lightweight model. For example, the first preset model may adopt a structure in which a kNowledge-Enhanced semantic Representation model (ERNIE) and a Conditional Random Field (CRF) are combined, and the structure has strong feature extraction and generalization capabilities and can achieve a high analytic effect even with fewer samples. The second predetermined model may adopt a structure in which a Gated Recurrent Unit (GRU) neural network is combined with a CRF.
After the first model is obtained through training, the first model can be used as a teacher model, and then a second preset model is trained based on a second training sample and the first model to obtain a trained second preset model.
The training of the second preset model based on the second training sample and the first model may specifically include:
inputting the second training sample into the first model and a second preset model respectively to obtain a first result output by the first model and a second result output by the second preset model;
taking the first result as a soft label, and obtaining a first loss function based on the soft label and the second result; obtaining a second loss function based on the labeling information of the second training sample and the second result;
and updating the second preset model based on the first loss function and the second loss function in a reverse conduction mode.
By adopting the scheme, a lightweight target model can be obtained by training by utilizing the resolving power of the teacher model, so that the target model is further ensured to be easy to deploy and have a higher prediction speed under the condition of ensuring the accuracy of the target model.
In one embodiment, the method further comprises:
and when the preset period is reached, respectively processing all the currently saved text objects, namely respectively executing the processing of S101-S103 for all the currently saved text objects. The full text object may be the full candidate text object stored by the first server.
The preset period may be set according to an actual situation, for example, when the number of all currently stored text objects reaches a first threshold, the preset period is determined to be a first period; and under the condition that the quantity of all the currently stored text objects does not reach a first threshold value, determining that the preset period is a second period, wherein the second period is smaller than the first period.
In addition, under the condition that the number of all the currently stored text objects reaches a first threshold value, the preset period is determined to be a first period, and under the condition that the first period is reached, all the currently stored text objects are respectively processed; the method may further comprise: and under the condition that the first period is not reached, processing the text objects added in real time. That is, when the number of all the currently saved text objects reaches the first threshold, the processing of the foregoing S101 to S103 may be performed on the text objects added in real time, in addition to periodically performing the processing on all the currently saved text objects, respectively, based on the first cycle.
The first threshold value can be set according to practical situations, such as a value in the order of hundred million; the first period may also be set according to actual conditions, such as a week (7 days), or 15 days, or longer or shorter.
For example, for billions of text objects, all the text objects can be processed separately in a distributed processing manner with a first cycle of 7 days (namely, week level); while streaming real-time processing may be employed for daily increments of data (i.e., real-time augmented text objects).
And under the condition that the quantity of all the currently stored text objects does not reach a first threshold value, determining that the preset period is a second period, and under the condition that the second period is reached, respectively processing all the currently stored text objects. The second period may be set according to practical situations, and may be, for example, 1 hour, 2 hours, or more or less, and is not limited herein. That is, in a small-scale and frequently updated scene, all the currently saved text objects may be processed separately at a second period (i.e., an hour level).
After the foregoing processing is completed, a processed text object may be saved on the first server side, which may save the processed text object in an offline article database. The number of the processed text objects may be one or more, and the embodiment does not limit the number.
In one embodiment, the method may further comprise: and responding to a received target text object acquisition request sent by target equipment, selecting a target text object from the processed text objects, and sending the target text object to the target equipment.
The target text object is one of the processed text objects.
The target device may be a terminal device used by a user, for example, any one of a smart phone, a tablet computer, a notebook computer, a personal computer, and the like used by the user.
The target text object obtaining request may include relevant information of the target text object, and the relevant information of the target text object may include at least one of the following: the identification of the target text object, the number of the target text object and the name of the target text object.
The selecting, in response to receiving a target text object acquisition request sent by a target device, a target text object from the processed text objects, and sending the target text object to the target device may specifically include: in response to receiving a target text object acquisition request sent by target equipment, acquiring related information of a target text object based on the target text object acquisition request; selecting the target text object from the processed text objects based on the relevant information of the target text object; and sending the target text object for the target equipment.
Therefore, by adopting the scheme, one of the processed text objects of the determined candidate words, namely the associated display content thereof, can be used as the target text object to be sent to the target equipment, so that the user can more conveniently view the display content associated with the target words contained in the target text object when viewing the target text object at the target equipment side.
In one embodiment, the sending the target text object for the target device includes:
checking the candidate words contained in the target text object to obtain a checking result; and adjusting the target text object based on the verification result to obtain the adjusted target text object, and sending the adjusted target text object to the target equipment.
The verifying the candidate word included in the target text object to obtain a verification result may include: acquiring a current candidate word based on the position of the kth candidate word in the target text object; judging whether the current candidate word is consistent with the kth candidate word or not, and if so, judging that the verification result is verification pass; otherwise, the check result is failed. Wherein k is an integer of 1 or more. The kth candidate word is any one of all candidate words contained in the target text object, that is, each candidate word is checked based on the foregoing method, which is only described in detail.
The adjusting the target text object based on the verification result to obtain the adjusted target text object includes: under the condition that the verification result is that the current candidate word is consistent with the kth candidate word, processing is not performed; and under the condition that the verification result is that the current candidate word is inconsistent with the kth candidate word, deleting the display content related to the kth candidate word, and adjusting the display state of the kth candidate word from the first display state to a second display state.
Wherein the second presentation state may be different from the first presentation state. The first display state may refer to displaying in a preset color. The second presentation state may refer to presentation in a normal color. Wherein the second presentation state may also represent a presentation state (or color) of other articles in the text object except the articles determined as the candidate words. The preset color is different from the normal color. The preset color may be set according to actual conditions, for example, may be red, and may also be other colors; the normal color may be black, etc., but may also be other colors; as long as the preset color is different from the normal color, the preset color and the normal color are within the protection scope of the present embodiment.
It should be noted that the position of the kth candidate word in the target text object may be obtained from a sentence recognition result output based on the target model when the target text object is subjected to the foregoing processing, and a specific obtaining manner is not described repeatedly here.
By adopting the scheme, the candidate words contained in the target text object can be verified again before the target text object is sent to the target equipment, so that the candidate words in the target text object displayed on the target equipment side are more accurate, and the accuracy of subsequent query and display is ensured.
In one embodiment, the method may further comprise: in response to receiving a target word content acquisition request sent by the target equipment, acquiring display content associated with the target word, and sending the display content associated with the target word to the target equipment; wherein the target word is one of the candidate words included in the target text object.
That is to say, the presentation content associated with the candidate word included in the target text object may be stored in the first server, and when the target word content acquisition request sent by the target device is received, the presentation content associated with the target word may be searched based on the target word content acquisition request, and the presentation content may be sent to the target device.
Of course, there may be an implementation manner that, when a target text object is selected from the processed text objects and the target text object is sent to the target device, all candidate words and their associated display contents included in the target text object are sent to the target device. In this way, the target device side may directly respond to the target word content acquisition request and display the display content associated with the target word at a preset position. I.e. in this way the processing of the presentation can be done without the target device interacting with the first server.
Therefore, by adopting the scheme, the target equipment can further provide the display content associated with any one target word for the target equipment in the scene of displaying the target text object, and the display content associated with the target word can be timely and accurately displayed on the target equipment side.
Finally, with reference to fig. 3, an exemplary description is made of the information processing method provided by the foregoing first aspect embodiment:
s301: acquiring clauses contained in the text object; the text object may specifically be an article to be processed.
S302: inputting the clauses into a target model to obtain a clause recognition result output by the target model;
s303: taking the entity word as the initial entity word under the condition that the type of the entity word contained in the sentence segmentation recognition result is a preset type, and determining the position of the initial entity word in the text object based on the sentence segmentation position information of the entity word contained in the sentence segmentation recognition result;
s304: responding to the initial entity word matching information, and taking the initial entity word as a candidate word;
s305: under the condition that at least one of the matching content and the candidate word is determined to meet a preset condition, taking the matching content as display content associated with the candidate word in the text object, and setting the candidate word in the text object to be in a first display state;
s306: updating the text object based on the display content associated with the candidate word, and saving the updated text object as a processed text object;
s307: in response to receiving a target text object acquisition request sent by target equipment, selecting a target text object from the processed text objects;
s308: checking the candidate words contained in the target text object to obtain a checking result;
s309: and adjusting the target text object based on the verification result to obtain the adjusted target text object, and sending the adjusted target text object to the target equipment.
An embodiment of the second aspect of the present disclosure further provides an information processing system, as shown in fig. 4, including:
a first server 401, configured to obtain an initial entity word in a text object; responding to the fact that the initial entity words have matching information, taking the initial entity words as candidate words, and taking matching contents as display contents associated with the candidate words, wherein the matching contents correspond to the matching information; and updating the text object based on the display content associated with the candidate words, and saving the updated text object as a processed text object.
The first server 401 is configured to obtain clauses included in the text object; inputting the clauses into a target model to obtain a clause recognition result output by the target model; and under the condition that the type of the entity word contained in the sentence segmentation recognition result is a preset type, taking the entity word as the initial entity word, and determining the position of the initial entity word in the text object based on the sentence segmentation position information of the entity word contained in the sentence segmentation recognition result.
The first server 401 is configured to, when it is determined that a preset condition is met based on at least one of the matching content and the candidate word, use the matching content as display content associated with the candidate word in the text object, and set the candidate word in the text object to be in a first display state.
The first server 401 is configured to perform at least one of the following:
determining that the preset condition is met under the condition that the position of the candidate word in the text object is a preset position;
determining that the preset condition is met under the condition that other candidate words which are the same as the candidate words exist in the text object and the positions of the candidate words in the text object are before the other candidate words;
and determining that the preset condition is met under the condition that the matching content contains the candidate link and the residual quantity of the candidate objects associated with the candidate link is greater than a preset quantity threshold value.
The first server 401 is configured to train a first preset model based on a first training sample to obtain a first model; the first training sample is a training sample in a first sample set; training a second preset model based on a second training sample and the first model to obtain a trained second preset model, and taking the trained second preset model as the target model; the second training sample is a training sample in a second sample set; the number of training samples in the second set of samples is greater than the number of training samples in the first set of samples.
As shown in fig. 5, the system may further include:
the target device 501 is configured to send a target text object acquisition request to the first server, and receive and display a target text object sent by the first server;
the first server 401 is configured to, in response to receiving a target text object obtaining request sent by a target device, select the target text object from the processed text objects, and send the target text object to the target device.
The target device 501 may be a terminal device used by a user, such as any one of a smart phone, a tablet computer, a notebook computer, and a personal computer used by the user.
The first server 401 is configured to verify the candidate word included in the target text object to obtain a verification result; and adjusting the target text object based on the verification result to obtain the adjusted target text object, and sending the adjusted target text object to the target equipment.
Regarding the operation of the target device 501 on the candidate words contained in the target text object, there may be the following two ways, respectively:
the first mode,
The target device 501 is configured to obtain display content associated with the target word based on a target word content obtaining request; displaying the presentation content in a first window; the target word is one of the candidate words included in the target text object.
In this way, the target text object received by the target device may directly contain the presentation content associated with each candidate word. The target word content obtaining request may be: the user clicks on a target word in the candidate words.
The target device may take any one candidate word in the target text object as the target word and generate the target word content acquisition request when detecting that the user clicks the candidate word; based on a target word content acquisition request, selecting display content associated with the target word from locally stored display content associated with all candidate words; and then, adding the display content related to the target word to the first window and displaying the display content.
Wherein the size of the first window may be smaller than a display area of the target device; the position of the first window may be at a specified position of the target word, such as above, above right, below right, right side, below, etc. of the target word. The display state of the first window may be a translucent state, an opaque state, or the like.
The second mode,
The target device 501 is configured to obtain display content associated with the target word based on a target word content obtaining request; displaying the presentation content in a first window; the target word is one of the candidate words included in the target text object.
The target device 501 is configured to send the target word content obtaining request to the first server, and receive the display content associated with the target word sent by the first server;
the first server 401 is configured to, in response to receiving a target word content obtaining request sent by the target device, obtain display content associated with the target word, and send the display content associated with the target word to the target device.
In this way, the target text object received by the target device may not contain the presentation content associated with the candidate word. The target word content obtaining request may be: the user clicks on a target word in the candidate words.
The target device 501 may, when it is detected that the user clicks any one candidate word in the target text object, use the candidate word as the target word to generate the target word content acquisition request; sending a target word content acquisition request to the first server; the first server 401 is configured to, in response to receiving a target word content obtaining request sent by the target device, obtain display content associated with the target word, and send the display content associated with the target word to the target device; then, the target device 501 may display the received presentation content associated with the target word in the first window.
Wherein the size of the first window may be smaller than a display area of the target device; the position of the first window may be at a specified position of the target word, such as above, above right, below right, right side, below, etc. of the target word. The display state of the first window may be a translucent state, an opaque state, or the like.
Further, the system further comprises:
a second server 502, configured to send the target page content to the target device in response to a target page obtaining request sent by the target device;
the target device 501 is configured to, when the display content includes a target link, send the target page acquisition request to the second server in response to an operation of the target link for the display content displayed in the first window, and receive and display the target page content sent by the second server.
Specifically, the presentation content may or may not include a target link. When the user views the display content associated with the target word displayed in the first window on the target device side, if the display content contains the target link, whether the target link is clicked or not can be determined according to the requirement of the user so as to further acquire the related content. If the user determines to further acquire the related content, the target link may be clicked, and correspondingly, the target device may send the target page acquisition request to the second server in response to an operation of the target link for the display content displayed in the first window; then, the second server 502 is configured to send the target page content to the target device in response to a target page acquisition request sent by the target device; finally, the target device receives and displays the target page content sent by the second server.
The second server may be the same as or different from the first server, and is not limited herein.
The target page content is different according to different actual situations, for example, the target word is a medicine, and correspondingly, the target page content may be an electronic mall capable of purchasing the medicine; the target word is a disease, and correspondingly, the target page content can be a page of a hospital capable of treating the disease. It is not further exhaustive here.
It should be understood that the processing that can be performed by the first server provided in the second aspect embodiment is the same as the information processing method in the foregoing first aspect embodiment, and is not described in detail.
By adopting the scheme, the initial entity words in the text object can be obtained, and further under the condition that the initial entity has matching information, the initial entity words are used as candidate words, the display contents of the candidate words are determined, and the text object is updated and stored as a processed text object based on the display contents associated with the candidate words; therefore, when the processed text object is displayed, the content related to the words in the processed text object can be checked more efficiently and conveniently, and the overall efficiency is improved.
An embodiment of a third aspect of the present disclosure provides an information processing apparatus, as shown in fig. 6, including:
an initial processing module 601, configured to obtain an initial entity word in a text object;
a content processing module 602, configured to, in response to that there is matching information for the initial entity word, use the initial entity word as a candidate word, and use matching content as display content associated with the candidate word, where the matching content corresponds to the matching information;
an updating module 603, configured to update the text object based on the display content associated with the candidate word;
and a storage module 604, configured to store the updated text object as a processed text object.
The initial processing module 601 is configured to obtain clauses included in the text object; inputting the clauses into a target model to obtain a clause recognition result output by the target model; and under the condition that the type of the entity word contained in the sentence segmentation recognition result is a preset type, taking the entity word as the initial entity word, and determining the position of the initial entity word in the text object based on the sentence segmentation position information of the entity word contained in the sentence segmentation recognition result.
The content processing module 602 is configured to, when it is determined that a preset condition is met based on at least one of the matching content and the candidate word, use the matching content as display content associated with the candidate word in the text object, and set the candidate word in the text object to be in a first display state.
The content processing module 602 is configured to perform at least one of the following:
determining that the preset condition is met under the condition that the position of the candidate word in the text object is a preset position;
determining that the preset condition is met if other candidate words identical to the candidate word exist in the text object and the position of the candidate word in the text object is before the other candidate words;
and determining that the preset condition is met under the condition that the matching content contains the candidate link and the residual quantity of the candidate objects associated with the candidate link is greater than a preset quantity threshold value.
On the basis of fig. 6, the apparatus shown in fig. 7 further includes:
a training module 701, configured to train a first preset model based on a first training sample to obtain a first model; the first training sample is a training sample in a first sample set; training a second preset model based on a second training sample and the first model to obtain a trained second preset model, and taking the trained second preset model as the target model; the second training sample is a training sample in a second sample set; the number of training samples in the second set of samples is greater than the number of training samples in the first set of samples.
The device further comprises:
a communication module 702, configured to receive a target text object acquisition request sent by a target device, and send the target text object to the target device;
a selecting module 703, configured to select a target text object from the processed text objects in response to the communication module receiving a target text object acquisition request sent by a target device.
The device further comprises:
the verification module 704 is configured to verify the candidate words included in the target text object to obtain a verification result; adjusting the target text object based on the verification result to obtain the adjusted target text object;
the communication module 702 is configured to send the adjusted target text object to the target device.
The communication module 702 is configured to receive a target word content acquisition request sent by the target device, and send the display content associated with the target word to the target device;
the selecting module 703 is configured to respond to the target word content obtaining request and obtain display content associated with the target word; wherein the target word is one of the candidate words included in the target text object.
It should be understood that the processing that can be performed by the information processing apparatus provided in the embodiment of the third aspect is the same as the information processing method in the embodiment of the first aspect, and is not repeated here.
By adopting the scheme, the text object can be processed to obtain the candidate words in the clauses, further, under the condition that the candidate words contain the matching information, the display content of the candidate words is determined, and the candidate words and the display content thereof contained in the text object are associated and stored as the processed text object; therefore, when the processed text object is displayed, the associated content of words in the processed text object can be checked more efficiently and conveniently, and the overall efficiency is improved.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, first servers, blade first servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the electronic apparatus 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 801 executes the respective methods and processes described above. For example, in some embodiments, the various methods described above may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. When loaded into RAM803 and executed by computing unit 801, may perform one or more of the steps of the respective methods described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the various methods described above in any other suitable manner (e.g., by way of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or first server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data first server), or that includes a middleware component (e.g., an application first server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include a client and a first server. The client and the first server are generally remote from each other and typically interact through a communication network. The relationship of client and first server arises by virtue of computer programs running on the respective computers and having a client-first server relationship to each other. The first server may be a cloud first server, a first server of a distributed system, or a first server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (29)

1. An information processing method, comprising:
acquiring initial entity words in the text object;
responding to the fact that the initial entity words have matching information, taking the initial entity words as candidate words, and taking matching contents as display contents associated with the candidate words, wherein the matching contents correspond to the matching information;
and updating the text object based on the display content associated with the candidate words, and saving the updated text object as a processed text object.
2. The method of claim 1, wherein the obtaining initial entity words in a text object comprises:
acquiring clauses contained in the text object;
inputting the clauses into a target model to obtain a clause recognition result output by the target model;
and under the condition that the type of the entity word contained in the sentence segmentation recognition result is a preset type, taking the entity word as the initial entity word, and determining the position of the initial entity word in the text object based on the sentence segmentation position information of the entity word contained in the sentence segmentation recognition result.
3. The method of claim 2, wherein the using the matching content as the presentation content associated with the candidate word comprises:
and under the condition that at least one of the matching content and the candidate word is determined to meet a preset condition, taking the matching content as display content associated with the candidate word in the text object, and setting the candidate word in the text object to be in a first display state.
4. The method of claim 3, further comprising at least one of:
determining that the preset condition is met under the condition that the position of the candidate word in the text object is a preset position;
determining that the preset condition is met under the condition that other candidate words which are the same as the candidate words exist in the text object and the positions of the candidate words in the text object are before the other candidate words;
and determining that the preset condition is met under the condition that the matching content contains the candidate link and the residual quantity of the candidate objects associated with the candidate link is greater than a preset quantity threshold value.
5. The method of any of claims 2-4, further comprising:
training a first preset model based on a first training sample to obtain a first model; the first training sample is a training sample in a first sample set;
training a second preset model based on a second training sample and the first model to obtain a trained second preset model, and taking the trained second preset model as the target model; the second training sample is a training sample in a second sample set; the number of training samples in the second set of samples is greater than the number of training samples in the first set of samples.
6. The method of any of claims 1-5, further comprising:
and responding to a received target text object acquisition request sent by target equipment, selecting a target text object from the processed text objects, and sending the target text object to the target equipment.
7. The method of claim 6, wherein the sending the target text object for the target device comprises:
checking the candidate words contained in the target text object to obtain a checking result; and adjusting the target text object based on the verification result to obtain the adjusted target text object, and sending the adjusted target text object to the target equipment.
8. The method of claim 6, further comprising:
in response to receiving a target word content acquisition request sent by the target equipment, acquiring display content associated with the target word, and sending the display content associated with the target word to the target equipment; wherein the target word is one of the candidate words included in the target text object.
9. An information processing system comprising:
the first server is used for acquiring initial entity words in the text object; responding to the fact that the initial entity words have matching information, taking the initial entity words as candidate words, and taking matching contents as display contents associated with the candidate words, wherein the matching contents correspond to the matching information; and updating the text object based on the display content associated with the candidate words, and saving the updated text object as a processed text object.
10. The system of claim 9, wherein the first server is configured to obtain clauses included in the text object; inputting the clauses into a target model to obtain a clause recognition result output by the target model; and under the condition that the type of the entity word contained in the sentence segmentation recognition result is a preset type, taking the entity word as the initial entity word, and determining the position of the initial entity word in the text object based on the sentence segmentation position information of the entity word contained in the sentence segmentation recognition result.
11. The system according to claim 10, wherein the first server is configured to, in a case that it is determined that a preset condition is met based on at least one of the matching content and the candidate word, use the matching content as the presentation content associated with the candidate word in the text object, and set the candidate word in the text object to be in a first presentation state.
12. The system of claim 11, wherein the first server is configured to perform at least one of:
determining that the preset condition is met under the condition that the position of the candidate word in the text object is a preset position;
determining that the preset condition is met under the condition that other candidate words which are the same as the candidate words exist in the text object and the positions of the candidate words in the text object are before the other candidate words;
and determining that the preset condition is met under the condition that the matching content contains the candidate link and the residual quantity of the candidate objects associated with the candidate link is greater than a preset quantity threshold value.
13. The system according to any one of claims 10 to 12, wherein the first server is configured to train a first preset model based on a first training sample to obtain a first model; the first training sample is a training sample in a first sample set; training a second preset model based on a second training sample and the first model to obtain a trained second preset model, and taking the trained second preset model as the target model; the second training sample is a training sample in a second sample set; the number of training samples in the second set of samples is greater than the number of training samples in the first set of samples.
14. The system of any of claims 9-13, further comprising:
the target equipment is used for sending a target text object acquisition request to the first server, receiving and displaying the target text object sent by the first server;
and the first server is used for responding to a received target text object acquisition request sent by target equipment, selecting the target text object from the processed text objects and sending the target text object to the target equipment.
15. The system of claim 14, wherein the first server is configured to verify the candidate word included in the target text object to obtain a verification result; and adjusting the target text object based on the verification result to obtain the adjusted target text object, and sending the adjusted target text object to the target equipment.
16. The system of claim 14, wherein,
the target equipment is used for acquiring display content related to the target words based on a target word content acquisition request; displaying the display content in a first window; the target word is one of the candidate words included in the target text object.
17. The system of claim 16, wherein,
the target device is used for sending the target word content acquisition request to the first server and receiving the display content related to the target word sent by the first server;
the first server is configured to, in response to receiving the target word content acquisition request sent by the target device, acquire the presentation content associated with the target word, and send the presentation content associated with the target word to the target device.
18. The system of claim 16 or 17, further comprising:
the second server is used for responding to a target page acquisition request sent by the target equipment and sending the target page content to the target equipment;
the target device is configured to, in a case that the display content includes a target link, send the target page acquisition request to the second server in response to an operation on the target link of the display content displayed in the first window, and receive and display the target page content sent by the second server.
19. An information processing apparatus includes:
the initial processing module is used for acquiring initial entity words in the text object;
the content processing module is used for responding to the fact that the initial entity words have matching information, using the initial entity words as candidate words and using matching content as display content related to the candidate words, wherein the matching content corresponds to the matching information;
the updating module is used for updating the text object based on the display content associated with the candidate words;
and the storage module is used for storing the updated text object as a processed text object.
20. The apparatus of claim 19, wherein the initial processing module is configured to obtain a clause included in the text object; inputting the clauses into a target model to obtain a clause recognition result output by the target model; and under the condition that the type of the entity word contained in the sentence segmentation recognition result is a preset type, taking the entity word as the initial entity word, and determining the position of the initial entity word in the text object based on the sentence segmentation position information of the entity word contained in the sentence segmentation recognition result.
21. The apparatus of claim 20, wherein the content processing module is configured to, in a case that it is determined that a preset condition is met based on at least one of the matching content and the candidate word, use the matching content as display content associated with the candidate word in the text object, and set the candidate word in the text object to be in a first display state.
22. The apparatus of claim 21, wherein the content processing module is configured to perform at least one of:
determining that the preset condition is met under the condition that the position of the candidate word in the text object is a preset position;
determining that the preset condition is met if other candidate words identical to the candidate word exist in the text object and the position of the candidate word in the text object is before the other candidate words;
and determining that the preset condition is met under the condition that the matched content comprises the candidate link and the residual quantity of the candidate objects associated with the candidate link is greater than a preset quantity threshold value.
23. The apparatus of any of claims 20-22, further comprising:
the training module is used for training a first preset model based on a first training sample to obtain a first model; the first training sample is a training sample in a first sample set; training a second preset model based on a second training sample and the first model to obtain a trained second preset model, and taking the trained second preset model as the target model; the second training sample is a training sample in a second sample set; the number of training samples in the second set of samples is greater than the number of training samples in the first set of samples.
24. The apparatus of any of claims 19-23, further comprising:
the communication module is used for receiving a target text object acquisition request sent by target equipment and sending the target text object to the target equipment;
and the selecting module is used for selecting the target text object from the processed text objects in response to the communication module receiving a target text object acquisition request sent by the target equipment.
25. The apparatus of claim 24, further comprising:
the verification module is used for verifying the candidate words contained in the target text object to obtain a verification result; adjusting the target text object based on the verification result to obtain the adjusted target text object;
the communication module is used for sending the adjusted target text object to the target equipment.
26. The apparatus of claim 25, wherein,
the communication module is used for receiving a target word content acquisition request sent by the target equipment and sending the display content related to the target word to the target equipment;
the selection module is used for responding to the target word content acquisition request and acquiring display content related to the target word; wherein the target word is one of the candidate words included in the target text object.
27. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
28. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
29. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.
CN202210176449.8A 2022-02-25 2022-02-25 Information processing method, system, device, electronic equipment and storage medium Pending CN114579701A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210176449.8A CN114579701A (en) 2022-02-25 2022-02-25 Information processing method, system, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210176449.8A CN114579701A (en) 2022-02-25 2022-02-25 Information processing method, system, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114579701A true CN114579701A (en) 2022-06-03

Family

ID=81770992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210176449.8A Pending CN114579701A (en) 2022-02-25 2022-02-25 Information processing method, system, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114579701A (en)

Similar Documents

Publication Publication Date Title
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN113553412B (en) Question-answering processing method, question-answering processing device, electronic equipment and storage medium
US20230114673A1 (en) Method for recognizing token, electronic device and storage medium
EP3992814A2 (en) Method and apparatus for generating user interest profile, electronic device and storage medium
CN115546488A (en) Information segmentation method, information extraction method and training method of information segmentation model
CN113392218A (en) Training method of text quality evaluation model and method for determining text quality
CN112906368A (en) Industry text increment method, related device and computer program product
CN114647739B (en) Entity chain finger method, device, electronic equipment and storage medium
CN113792230B (en) Service linking method, device, electronic equipment and storage medium
CN108733702B (en) Method, device, electronic equipment and medium for extracting upper and lower relation of user query
CN115759100A (en) Data processing method, device, equipment and medium
CN111507109A (en) Named entity identification method and device of electronic medical record
CN112905743B (en) Text object detection method, device, electronic equipment and storage medium
CN114492370A (en) Webpage identification method and device, electronic equipment and medium
CN114048315A (en) Method and device for determining document tag, electronic equipment and storage medium
CN114579701A (en) Information processing method, system, device, electronic equipment and storage medium
CN114547252A (en) Text recognition method and device, electronic equipment and medium
CN113806541A (en) Emotion classification method and emotion classification model training method and device
CN113807920A (en) Artificial intelligence based product recommendation method, device, equipment and storage medium
CN116226478B (en) Information processing method, model training method, device, equipment and storage medium
CN113239296B (en) Method, device, equipment and medium for displaying small program
CN112989797B (en) Model training and text expansion methods, devices, equipment and storage medium
CN115033701B (en) Text vector generation model training method, text classification method and related device
CN113627197B (en) Text intention recognition method, device, equipment and storage medium
CN115080845A (en) Recommendation reason generation method and device, electronic device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination