CN111814461A - Text processing method, related device and readable storage medium - Google Patents

Text processing method, related device and readable storage medium Download PDF

Info

Publication number
CN111814461A
CN111814461A CN202010656329.9A CN202010656329A CN111814461A CN 111814461 A CN111814461 A CN 111814461A CN 202010656329 A CN202010656329 A CN 202010656329A CN 111814461 A CN111814461 A CN 111814461A
Authority
CN
China
Prior art keywords
text
processed
character
determining
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010656329.9A
Other languages
Chinese (zh)
Other versions
CN111814461B (en
Inventor
王硕
盛志超
郭冬杰
李�浩
李永帅
段纪丁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202010656329.9A priority Critical patent/CN111814461B/en
Publication of CN111814461A publication Critical patent/CN111814461A/en
Application granted granted Critical
Publication of CN111814461B publication Critical patent/CN111814461B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Character Input (AREA)
  • Machine Translation (AREA)

Abstract

After a text to be processed is obtained, an object set contained in the text to be processed is determined, for each object in the object set, an attribute corresponding to the object is determined, and the attribute and the object are combined to obtain a target object. According to the method for processing the text and identifying the target object, compared with a manual mode, labor and time can be saved. Furthermore, in the application, the specific reference relation of the target object can be clarified through different object attributes, so that the accuracy of the identified target object can be higher.

Description

Text processing method, related device and readable storage medium
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a text processing method, a related device, and a readable storage medium.
Background
In some scenarios, it is often desirable to identify an object from text, for example, in the jurisdictional field, to identify an object that is dirty from a document (e.g., a prosecution comment, an identification report, an inquiry note, an inspection note, an identification note, etc.) in a jurisdictional file.
At present, a manual mode is mostly adopted to identify a certain object from a text, however, the manual mode consumes a lot of manpower and time, and has low efficiency and low identification accuracy.
Disclosure of Invention
In view of the foregoing problems, the present application provides a text processing method, a related device, and a readable storage medium. The specific scheme is as follows:
a text processing method, comprising:
acquiring a text to be processed;
determining a set of objects contained in the text to be processed;
and aiming at each object in the object set, determining the attribute corresponding to the object, and combining the attribute and the object to obtain a target object.
Optionally, the determining the set of objects included in the text to be processed includes:
determining character-level features of each character in the text to be processed, and text-level features of the text to be processed;
splicing the character level characteristics of each character in the text to be processed with the text level characteristics of the text to be processed to obtain the characteristics of the spliced characters;
identifying the characteristics of the spliced characters to obtain an object identification result of each character;
and determining the object set contained in the text to be processed based on the object recognition result of each character.
Optionally, the determining, for each object in the set of objects, a property corresponding to the object includes:
obtaining a dependency syntax relation among all characters in the text to be processed;
for each character in the text to be processed, determining object attribute characteristics of the character according to character level characteristics of the character, object recognition results of the character and dependency syntactic relations among the characters in the text to be processed;
identifying the object attribute characteristics of each character in the text to be processed, and determining the attribute corresponding to each object in the object set.
Optionally, the determining, for each character in the text to be processed, an object attribute feature of the character according to the character-level feature of the character, the object recognition result of the character, and a dependency syntax relationship between respective characters in the text to be processed includes:
generating object recognition characteristics of the characters according to the character level characteristics of the characters and the object recognition results of the characters;
and determining the object attribute characteristics of the characters according to the object identification characteristics of the characters in the text to be processed and the dependency syntactic relation among the characters in the text to be processed.
Optionally, the number of the texts to be processed is multiple, and the method further includes:
and performing the association of the same object on the target object corresponding to each text to be processed.
Optionally, the associating the target objects corresponding to the texts to be processed with the same object includes:
determining two target objects to be judged from target objects corresponding to the texts to be processed, wherein the two target objects to be judged are respectively contained in different texts to be processed;
judging whether the two target objects to be judged are matched or not;
and if the two target objects to be judged are matched, determining that the two target objects to be judged are the same object.
Optionally, the determining whether the two target objects to be determined are matched includes:
and processing the two target objects to be judged by using a matching judgment model to obtain a judgment result of whether the two target objects to be judged, which are output by the matching judgment model, are matched, wherein the matching judgment model is obtained by taking a target object pair as a training sample and taking a judgment result of whether the target object pair is labeled to be matched as a sample label for training.
Optionally, the processing, by using a matching determination model, of processing the two target objects to be determined to obtain a determination result, output by the matching determination model, of whether the two target objects to be determined match or not, includes:
comparing the two target objects to be judged by utilizing a first matching judgment module of the matching judgment model to obtain a first matching judgment result;
comparing the same object attributes in the two target objects to be judged by utilizing a second matching judgment module of the matching judgment model to obtain a second matching judgment result;
and determining whether the two target objects to be determined are matched or not based on the first matching determination result and the second matching determination result by using a comprehensive matching determination module of the matching determination model.
A text processing apparatus comprising:
the acquisition unit is used for acquiring a text to be processed;
the object set determining unit is used for determining an object set contained in the text to be processed;
and the target object determining unit is used for determining the attribute corresponding to each object in the object set and combining the attribute with the object to obtain the target object.
Optionally, the object set determining unit includes:
the characteristic determining unit is used for determining the character-level characteristic of each character in the text to be processed and the text-level characteristic of the text to be processed;
the character splicing unit is used for splicing the character-level characteristics of each character in the text to be processed with the text-level characteristics of the text to be processed to obtain the characteristics of the spliced characters;
the characteristic identification unit is used for identifying the characteristics of the spliced characters to obtain an object identification result of each character;
and the object set determining subunit is used for determining an object set contained in the text to be processed based on the object recognition result of each character.
Optionally, the target object determining unit includes:
the dependency syntax relationship obtaining unit is used for obtaining the dependency syntax relationship among all characters in the text to be processed;
the object attribute feature determining unit is used for determining the object attribute features of the characters according to the character level features of the characters, the object recognition results of the characters and the dependency syntax relationship among the characters in the text to be processed aiming at each character in the text to be processed;
and the object attribute feature identification unit is used for identifying the object attribute features of all characters in the text to be processed and determining the attribute corresponding to each object in the object set.
Optionally, the object property feature determining unit includes:
the object recognition feature determining unit is used for generating the object recognition features of the characters according to the character-level features of the characters and the object recognition results of the characters;
and the object attribute characteristic determining subunit is used for determining the object attribute characteristics of the characters according to the object identification characteristics of the characters in the text to be processed and the dependency syntax relationship among the characters in the text to be processed.
Optionally, the number of the texts to be processed is multiple, and the apparatus further includes:
and the object association unit is used for associating the target objects corresponding to the texts to be processed with the same object.
Optionally, the object associating unit includes:
the target object determining unit to be determined is used for determining two target objects to be determined from the target objects corresponding to the texts to be processed, wherein the two target objects to be determined are respectively contained in different texts to be processed;
the judging unit is used for judging whether the two target objects to be judged are matched or not; and if the two target objects to be judged are matched, determining that the two target objects to be judged are the same object.
Optionally, the determining unit is specifically configured to:
and processing the two target objects to be judged by using a matching judgment model to obtain a judgment result of whether the two target objects to be judged, which are output by the matching judgment model, are matched, wherein the matching judgment model is obtained by taking a target object pair as a training sample and taking a judgment result of whether the target object pair is labeled to be matched as a sample label for training.
Optionally, the processing, by using a matching determination model, of processing the two target objects to be determined to obtain a determination result, output by the matching determination model, of whether the two target objects to be determined match or not, includes:
comparing the two target objects to be judged by utilizing a first matching judgment module of the matching judgment model to obtain a first matching judgment result;
comparing the same object attributes in the two target objects to be judged by utilizing a second matching judgment module of the matching judgment model to obtain a second matching judgment result;
and determining whether the two target objects to be determined are matched or not based on the first matching determination result and the second matching determination result by using a comprehensive matching determination module of the matching determination model.
A text processing apparatus comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the text processing method.
A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the text processing method as described above.
By means of the technical scheme, after the text to be processed is obtained, an object set contained in the text to be processed is determined, for each object in the object set, an attribute corresponding to the object is determined, and the attribute and the object are combined to obtain a target object. According to the method for processing the text and identifying the target object, compared with a manual mode, labor and time can be saved. Furthermore, in the application, the specific reference relation of the target object can be clarified through different object attributes, so that the accuracy of the identified target object can be higher.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic flowchart of a text processing method disclosed in an embodiment of the present application;
fig. 2 is a schematic structural diagram of an object recognition model disclosed in an embodiment of the present application;
FIG. 3 is a diagram illustrating dependency syntax between characters in a text according to an embodiment of the present disclosure;
FIG. 4 is a schematic flow chart diagram of another text processing method disclosed in the embodiments of the present application;
fig. 5 is a schematic structural diagram of a matching decision model disclosed in an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present disclosure;
fig. 7 is a block diagram of a hardware structure of a text processing apparatus according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Next, a text processing method provided by the present application will be described by the following embodiments.
Referring to fig. 1, fig. 1 is a schematic flowchart of a text processing method disclosed in an embodiment of the present application, where the method may include the following steps:
step S101: and acquiring a text to be processed.
In the present application, the text to be processed may be composed of words expressed in any one of written languages (e.g., chinese, english, etc.). The text to be processed may be a sentence, a paragraph, or a chapter, and the application is not limited thereto.
It should be noted that the text to be processed may be a text obtained based on technologies such as speech recognition, picture recognition, and input method recognition, or may be a document with a specific format, and the application is not limited in any way.
For ease of understanding, the present application presents the following examples of pending text:
"Zhang San, 3 months and 4 days, two electric vehicles, one red Yadi and one black Ama, were stolen in the fertilizer plant; a grey associative computer is stolen in a festival district in Longqing day 3 and 16, and then a Honda CB400 motorcycle is stolen. "
Step S102: and determining the object set contained in the text to be processed.
In this application, a set of objects includes at least one object. The object may be a character having a certain type of commonality in the text, for example, the object may be an object appearing in the text, and the object may also be a name of a person, a name of a place, or the like appearing in the text. The present application is not limited to this.
To facilitate understanding of the object set, the present application gives the following examples:
suppose the text to be processed is that "Zhang three, two electric vehicles, one red Yadi and one black Ama are stolen in a fertilizer plant in 3 months and 4 days; a grey associative computer is stolen in a festival district in Longqing day 3 and 16, and then a Honda CB400 motorcycle is stolen. "if the object is an object, the set of objects contained in the text to be processed is" electric vehicle, computer, motorcycle ".
It should be noted that, the specific implementation manner of determining the object set included in the text to be processed will be described in detail through the following embodiments.
Step S103: and aiming at each object in the object set, determining the attribute corresponding to the object, and combining the attribute and the object to obtain a target object.
In the present application, different objects have different object attributes, for example, when the object is an object, the object attribute may be a color, a brand, a model, and the like. The target object is an object with object attributes, and for convenience of understanding, the text to be processed is assumed to be 'Zhang III', two electric vehicles, one red Yadi and one black Ama are stolen in a fertilizer plant in 3 months and 4 days; a grey associative computer is stolen in a festival district in Longqing day 3 and 16, and then a Honda CB400 motorcycle is stolen. "the set of objects contained in the text to be processed is" electric vehicle, computer, motorcycle ", the attributes corresponding to the electric vehicle are red, yadi, black, emma, the attributes corresponding to the computer are gray, association, and the attributes corresponding to the motorcycle are honda, CB 400. In the present application, combining the object with the attribute, the following target objects "red yadi electric vehicle, black emma electric vehicle, gray associative computer, honda CB400 motorcycle" can be obtained.
It should be noted that, for each object in the object set, an attribute corresponding to the object is determined, and the attribute is combined with the object to obtain a specific implementation manner of the target object, which will be described in detail through the following embodiments.
The embodiment discloses a text processing method, which includes the steps of determining an object set contained in a text to be processed after the text to be processed is obtained, determining an attribute corresponding to each object in the object set, and combining the attribute and the object to obtain a target object. According to the method for processing the text and identifying the target object, compared with a manual mode, labor and time can be saved. Furthermore, in the application, the specific reference relation of the target object can be clarified through different object attributes, so that the accuracy of the identified target object can be higher.
As an implementable manner, a specific implementation manner of determining an object set included in a text to be processed is disclosed in the present application, and the implementation manner may include the following steps:
step S201: determining character-level features of each character in the text to be processed, and text-level features of the text to be processed.
In the present application, the character-level feature of each character in the text to be processed may be semantic information of each character, it should be noted that different texts have their uniqueness, and the expressions of objects are also different, and in order to improve the accuracy of identifying objects in different texts, the text-level feature of the text to be processed is further determined in the present invention, and the text-level feature of the text to be processed may be semantic information of the text to be processed.
Step S202: and splicing the character level characteristics of each character in the text to be processed with the text level characteristics of the text to be processed to obtain the characteristics of the spliced characters.
For the convenience of understanding, if the character-level feature of the character "electricity" is c and the text-level feature of the text to be processed is h, the feature after splicing the characters "electricity" is c + h.
Step S203: and identifying the characteristics of the spliced characters to obtain an object identification result of each character.
It should be noted that, in the present application, the above steps S201 to S203 may be executed based on an object recognition model, and as an implementation manner, the text to be processed may be input into the object recognition model, and an object set included in the text to be processed is output by the object recognition model, where the object recognition model is obtained by training a training text as a training sample and an object set labeled by the training text as a sample label.
The following describes a specific implementation manner of determining a set of objects contained in a text to be processed based on an object recognition model in detail.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an object recognition model disclosed in an embodiment of the present application, where the object recognition model includes: the device comprises a character level feature determination module, a text level feature determination module, a feature splicing module and an identification module.
After the text to be processed is input into the object recognition model, the text to be processed is processed by using the character-level feature determination module, so that the character-level feature of each character in the text to be processed, which is output by the character-level feature determination module, is obtained. And processing the text to be processed by utilizing the text-level feature determination module to obtain the text-level features of the text to be processed output by the text-level feature determination module. After the character level features of each character in the text to be processed are obtained, and the text level features of the text to be processed are obtained, the character level features of each character in the text to be processed and the text level features of the text to be processed are input into a splicing module, and the spliced features of each character in the text to be processed are obtained. And finally, inputting the characteristics of each character spliced in the text to be processed into a recognition module, and outputting the object recognition result of each character by the recognition module.
Wherein the character-level feature determination module may be implemented based on any one of a BERT (Bidirectional encoder characterizations from Transformers) model, a RoBERTA-large Chinese pre-training model, RoBERTA-wwm-ext, RoBERTA-wwm-large-ext.
The text-level feature determination model can be realized based on an LSTM (Long Short-Term Memory) network, and the LSTM network can encode the text to be processed to obtain the text-level features of the text to be processed. The present application is not limited to this.
The recognition module may include a full connection layer and a binary layer, and the object recognition result of each character is the binary result output by the binary layer. And finally, based on the binary classification result output by the binary classification layer, the object set contained in the text to be processed can be determined.
For convenience of understanding, the text to be processed is assumed to be that two electric vehicles, one red Yadi and one black Ama are stolen in a fertilizer plant in 3 months and 4 days; when a gray associative computer is stolen and then a Honda CB400 motorcycle is stolen in the Longqing district in 3 months and 16 days, the output of the classification layer is 1 when the character is an object, and the output of the classification layer is 0 when the character is not the object, so the output of the classification layer is 0000000000000001110000000000000000000000000000000000011000000000000000111.
Step S204: and determining the object set contained in the text to be processed based on the object recognition result of each character.
And determining the object set contained in the text to be processed as the electric vehicle, the computer and the motorcycle based on the output of the two classification layers.
As an implementable manner, the present application discloses a specific implementation manner for determining, for each object in the object set, an attribute corresponding to the object, and the specific implementation manner may include the following steps:
step S301: and obtaining the dependency syntax relation among all characters in the text to be processed.
It should be noted that the dependency syntax relationship may be obtained based on a currently commonly used dependency syntax relationship obtaining method, and this is not described in detail in this application.
For convenience of understanding, assuming that the text to be processed is "zhang san steals two electric vehicles, one red yadi and one black emma" in a fertilizer plant, the dependency syntax relationship between the characters in the text to be processed is shown in fig. 3.
Step S302: and for each character in the text to be processed, determining the object attribute characteristics of the character according to the character level characteristics of the character, the object recognition result of the character and the dependency syntactic relation among the characters in the text to be processed.
As an implementation manner, the determining, for each character in the text to be processed, an object attribute feature of the character according to the character-level feature of the character, the object recognition result of the character, and a dependency syntax relationship between respective characters in the text to be processed includes:
step S3021: and generating the object recognition characteristics of the characters according to the character level characteristics of the characters and the object recognition results of the characters.
In this application, the character-level feature of each character in the text to be processed may be obtained by processing the text to be processed by using the character-level feature determination module of the object recognition model, and is not described herein again. The object recognition result of the character may be obtained by using the recognition module of the object recognition model, where the object recognition result of the character is used to indicate whether the character is an object in the object set, and as an example, if the character is an object in the object set, the object recognition result of the character is 1, and if the character is not an object in the object set, the object recognition result of the character is 0.
As an implementation manner, the character-level features of the characters and the object recognition results of the characters may be encoded based on a Bi-directional Long Short-term memory (Bi-directional Short-term memory) network, so as to obtain the object recognition features of the characters.
Step S3022: and determining the object attribute characteristics of the characters according to the object identification characteristics of the characters in the text to be processed and the dependency syntactic relation among the characters in the text to be processed.
In this application, according to the dependency syntax relationship between the characters in the text to be processed, a specific implementation manner for determining the dependency syntax characteristics of the characters may be as follows: and determining the dependency syntax character corresponding to the character and the dependency syntax characteristics between the character and the corresponding dependency syntax character according to the dependency syntax relationship among the characters in the text to be processed, and splicing the object identification characteristics of the character, the object identification characteristics of the dependency syntax character corresponding to the character and the dependency syntax characteristics between the character and the corresponding dependency syntax character to obtain the object attribute characteristics of the character.
For ease of understanding, assume character x in the text to be processedhAnd xiWith dependency syntax relation r, for character xhSay its corresponding dependency syntax character is xi. Then the character xhDependency syntactic characteristics u ofi=[wi,wh,vr]Wherein w ishIs xhObject identification feature of wiIs xiV. of the object recognition feature ofrIs characteristic of r.
There are 14 kinds of dependency syntax relations, and a two-dimensional matrix of 14 × 200 can be preset in the application, and each dependency syntax is characterized by a vector of 1 × 200. In the present application, indices (e.g., 0 to 13) of the respective dependency syntaxes may be preset upon determining that the character having the dependency syntax r is xhAnd xiThereafter, the character x is determined from the index of the dependency syntax relationship rhAnd xiDependency syntax feature between.
Step S303: identifying the object attribute characteristics of each character in the text to be processed, and determining the attribute corresponding to each object in the object set.
In the application, the dependency syntax relationship among the characters in the text to be processed is blended into the object attribute characteristics of the characters in the text to be processed, and the attributes corresponding to different objects can be determined based on the dependency syntax relationship among the characters. For example, as shown in fig. 3, the relationship between the electric vehicle and the electric vehicle is object (denoted by VOB shown in fig. 3), the relationship between yadi and emma is parallel (denoted by COO shown in fig. 3), and it can be determined from the VOB and COO that yadi and emma are included in the electric vehicle and parallel.
It should be noted that, as an implementable manner, the step of "determining, for each object in the object set, an attribute corresponding to the object, and combining the attribute with the object to obtain the target object" may be implemented based on the target object determination model in the present application. The target object determination model is obtained by taking a training text as a training sample and taking a target object marked by the training text as a sample label for training. The object determination model is specifically configured to perform the above steps S301 to S303.
In some scenarios, it is often necessary to identify an object from a plurality of texts and perform the same object association on the identified object. For example, in the field of justice, integrity of the dirty chain is one of the necessary conditions for prosecuting suspects, and therefore, in order to determine whether the dirty chain corresponding to the justice is intact, it is necessary to identify dirty from a plurality of documents (e.g., a prosecution suggestion, an identification report, an inquiry record, an inspection record, an identification record, etc.) in the justice, and perform the same dirty association on each identified dirty, so as to determine whether the dirty chain corresponding to the justice is intact.
At present, a manual method is mostly adopted to identify a certain object from a plurality of texts, and the identified object is associated with the same object. For example, in the field of jurisdictions, it is necessary for a jurisdictional practitioner to identify dirty things from a plurality of documents in a jurisdictional, and to associate the same dirty things with each identified dirty thing, so as to determine whether a dirty chain corresponding to the jurisdictional is complete. However, the manual method consumes a lot of manpower and time, and is inefficient.
To solve the above problem, another text processing method is disclosed in the present application.
Referring to fig. 4, fig. 4 is a schematic flowchart of another text processing method disclosed in the embodiment of the present application, where the method may include the following steps:
step S401: and acquiring a plurality of texts to be processed.
In the present application, the plurality of texts to be processed may be texts having some association, for example, the plurality of texts to be processed may be a plurality of documents in a judicial portfolio (e.g., a prosecution opinion book, an appraisal report, a query record, a survey record, a recognition record, etc.).
Step S402: determining an object set contained in each text to be processed, determining an attribute corresponding to each object in the object set, and combining the attribute and the object to obtain a target object corresponding to the text to be processed.
It should be noted that, in the present application, for the processing manner of each text to be processed, reference may be made to the related description of step S102 and step S103, and details are not described here again.
Step S403: and performing the association of the same object on the target object corresponding to each text to be processed.
In the present application, the target objects corresponding to the texts to be processed are associated with one another to determine the target objects that are the same object in the texts to be processed, and a specific implementation manner will be described in detail through the following embodiments.
As an implementable manner, a specific implementation manner of associating the combined target objects corresponding to the texts to be processed with the same object is disclosed in the present application, and the implementation manner may include the following steps:
step S501: and determining two target objects to be judged from the target objects corresponding to the texts to be processed, wherein the two target objects to be judged are respectively contained in different texts to be processed.
For example, in the judicial field, a complete dirt chain is required to satisfy the condition that the dirt mentioned in the prosecution advice exists in the identification report, the query entry and the recognition entry, and two target objects to be determined can be the dirt contained in the prosecution advice and the dirt contained in the identification report respectively. For ease of understanding, the two target objects to be determined may be "a black 48V yadi electric vehicle" or "a red 48V yadi electric vehicle".
Step S502: judging whether the two target objects to be judged are matched or not; if there is a match, step S503 is executed, and if there is no match, step S504 is executed.
In the present application, the determination of whether the two target objects to be determined match or not may be implemented based on a neural network structure, specifically, the two target objects to be determined may be processed by using a matching determination model, so as to obtain a determination result of whether the two target objects to be determined match or not output by the matching determination model, and the matching determination model is obtained by using a target object pair as a training sample and using a determination result of whether the target object pair is labeled to match or not as a sample label for training.
Step S503: determining that the two target objects to be determined are the same object.
Step S504: determining that the two target objects to be determined are not the same object.
In another embodiment of the present application, a specific implementation manner of determining whether the two target objects to be determined match based on the matching determination model is described.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a matching determination model disclosed in an embodiment of the present application, where the matching determination model includes: the device comprises a first matching judgment module, a second matching judgment module and a comprehensive matching judgment module.
Based on the structure of the matching determination model, the process of processing the two target objects to be determined by using the matching determination model to obtain a determination result of whether the two target objects to be determined output by the matching determination model match includes:
s601: and comparing the two target objects to be judged by utilizing a first matching judgment module of the matching judgment model to obtain a first matching judgment result.
In the application, the feature of each target object to be judged can be determined, and the similarity of the features of the two target objects to be judged is compared to obtain a first matching judgment result.
For each target object to be determined, a specific implementation manner of determining the features of the target object to be determined may be: determining the characteristics of each character in the target object to be judged and the object attribute characteristics corresponding to each character in the target object to be judged; weighting the object attribute characteristics corresponding to each character in the target object to be judged to obtain weighted characteristics of the object attribute corresponding to each character in the target object to be judged; splicing the characteristics of each character in the target object to be judged and the weighted characteristics of the object attribute corresponding to each character in the target object to be judged to obtain the spliced characteristics of each character in the target object to be judged; and obtaining the characteristics of the target object to be judged according to the characteristics of the spliced characters in the target object to be judged.
S602: and comparing the same object attributes in the two target objects to be judged by utilizing a second matching judgment module of the matching judgment model to obtain a second matching judgment result.
In the application, the same object attribute in the two target objects to be determined may be determined, then the feature of the character corresponding to the same object attribute in each target object to be determined is compared with the similarity of the features of the character corresponding to the same object attribute in the two target objects to be determined, so as to obtain a second matching determination result.
For ease of understanding, assuming that the two target objects to be determined are "one black 48V yadi electric vehicle" and "one red 48V yadi electric vehicle", respectively, the same object attributes in the two target objects to be determined are "48V" and "yadi".
In this application, a specific implementation manner for determining the feature of each target object to be determined corresponding to the character with the same object attribute may be: and determining the characteristics of the target object to be judged corresponding to the characters with the same object attribute according to the spliced characteristics of the characters with the same object attribute in the target object to be judged.
S603: and determining whether the two target objects to be determined are matched or not based on the first matching determination result and the second matching determination result by using a comprehensive matching determination module of the matching determination model.
In the application, weights of the first matching judgment result and the second matching judgment result can be preset, a final matching judgment result is obtained based on the weights, and whether two target objects to be judged are matched or not can be determined according to the final matching judgment result and a preset judgment threshold.
It should be further noted that after the target objects corresponding to the texts to be processed are associated with the same object, other processing may be performed according to the association result, and as an implementable manner, the target objects missing in each text may be determined according to the association result. For example, in the jurisdictional field, after the same dirt association is performed on the dirt in each document in the jurisdictional, it can be determined whether the dirt in the prosecution comment is missing in the identification report, the inquiry record, the investigation record, the recognition record, and the like.
The following describes a text processing apparatus disclosed in an embodiment of the present application, and the text processing apparatus described below and the text processing method described above may be referred to in correspondence with each other.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present application. As shown in fig. 6, the text processing apparatus may include:
an obtaining unit 11, configured to obtain a text to be processed;
an object set determining unit 12, configured to determine an object set included in the text to be processed;
and a target object determining unit 13, configured to determine, for each object in the object set, an attribute corresponding to the object, and combine the attribute with the object to obtain a target object.
Optionally, the object set determining unit includes:
the characteristic determining unit is used for determining the character-level characteristic of each character in the text to be processed and the text-level characteristic of the text to be processed;
the character splicing unit is used for splicing the character-level characteristics of each character in the text to be processed with the text-level characteristics of the text to be processed to obtain the characteristics of the spliced characters;
the characteristic identification unit is used for identifying the characteristics of the spliced characters to obtain an object identification result of each character;
and the object set determining subunit is used for determining an object set contained in the text to be processed based on the object recognition result of each character.
Optionally, the target object determining unit includes:
the dependency syntax relationship obtaining unit is used for obtaining the dependency syntax relationship among all characters in the text to be processed;
the object attribute feature determining unit is used for determining the object attribute features of the characters according to the character level features of the characters, the object recognition results of the characters and the dependency syntax relationship among the characters in the text to be processed aiming at each character in the text to be processed;
and the object attribute feature identification unit is used for identifying the object attribute features of all characters in the text to be processed and determining the attribute corresponding to each object in the object set.
Optionally, the object property feature determining unit includes:
the object recognition feature determining unit is used for generating the object recognition features of the characters according to the character-level features of the characters and the object recognition results of the characters;
and the object attribute characteristic determining subunit is used for determining the object attribute characteristics of the characters according to the object identification characteristics of the characters in the text to be processed and the dependency syntax relationship among the characters in the text to be processed.
Optionally, the number of the texts to be processed is multiple, and the apparatus further includes:
and the object association unit is used for associating the target objects corresponding to the texts to be processed with the same object.
Optionally, the object associating unit includes:
the target object determining unit to be determined is used for determining two target objects to be determined from the target objects corresponding to the texts to be processed, wherein the two target objects to be determined are respectively contained in different texts to be processed;
the judging unit is used for judging whether the two target objects to be judged are matched or not; and if the two target objects to be judged are matched, determining that the two target objects to be judged are the same object.
Optionally, the determining unit is specifically configured to:
and processing the two target objects to be judged by using a matching judgment model to obtain a judgment result of whether the two target objects to be judged, which are output by the matching judgment model, are matched, wherein the matching judgment model is obtained by taking a target object pair as a training sample and taking a judgment result of whether the target object pair is labeled to be matched as a sample label for training.
Optionally, the processing, by using a matching determination model, of processing the two target objects to be determined to obtain a determination result, output by the matching determination model, of whether the two target objects to be determined match or not, includes:
comparing the two target objects to be judged by utilizing a first matching judgment module of the matching judgment model to obtain a first matching judgment result;
comparing the same object attributes in the two target objects to be judged by utilizing a second matching judgment module of the matching judgment model to obtain a second matching judgment result;
and determining whether the two target objects to be determined are matched or not based on the first matching determination result and the second matching determination result by using a comprehensive matching determination module of the matching determination model.
Referring to fig. 7, fig. 7 is a block diagram of a hardware structure of a text processing device according to an embodiment of the present application, and referring to fig. 7, the hardware structure of the text processing device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete mutual communication through the communication bus 4;
the processor 1 may be a central processing unit CPU, or an application specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 3 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;
wherein the memory stores a program and the processor can call the program stored in the memory, the program for:
acquiring a text to be processed;
determining a set of objects contained in the text to be processed;
and aiming at each object in the object set, determining the attribute corresponding to the object, and combining the attribute and the object to obtain a target object.
Alternatively, the detailed function and the extended function of the program may be as described above.
Embodiments of the present application further provide a readable storage medium, where a program suitable for being executed by a processor may be stored, where the program is configured to:
acquiring a text to be processed;
determining a set of objects contained in the text to be processed;
and aiming at each object in the object set, determining the attribute corresponding to the object, and combining the attribute and the object to obtain a target object.
Alternatively, the detailed function and the extended function of the program may be as described above.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A method of text processing, comprising:
acquiring a text to be processed;
determining a set of objects contained in the text to be processed;
and aiming at each object in the object set, determining the attribute corresponding to the object, and combining the attribute and the object to obtain a target object.
2. The method of claim 1, wherein the determining the set of objects contained in the text to be processed comprises:
determining character-level features of each character in the text to be processed, and text-level features of the text to be processed;
splicing the character level characteristics of each character in the text to be processed with the text level characteristics of the text to be processed to obtain the characteristics of the spliced characters;
identifying the characteristics of the spliced characters to obtain an object identification result of each character;
and determining the object set contained in the text to be processed based on the object recognition result of each character.
3. The method of claim 2, wherein the determining, for each object in the set of objects, a property corresponding to the object comprises:
obtaining a dependency syntax relation among all characters in the text to be processed;
for each character in the text to be processed, determining object attribute characteristics of the character according to character level characteristics of the character, object recognition results of the character and dependency syntactic relations among the characters in the text to be processed;
identifying the object attribute characteristics of each character in the text to be processed, and determining the attribute corresponding to each object in the object set.
4. The method according to claim 3, wherein the determining, for each character in the text to be processed, the object attribute feature of the character according to the character-level feature of the character, the object recognition result of the character, and the dependency syntax relationship between the respective characters in the text to be processed comprises:
generating object recognition characteristics of the characters according to the character level characteristics of the characters and the object recognition results of the characters;
and determining the object attribute characteristics of the characters according to the object identification characteristics of the characters in the text to be processed and the dependency syntactic relation among the characters in the text to be processed.
5. The method according to any one of claims 1 to 4, wherein the text to be processed is plural, the method further comprising:
and performing the association of the same object on the target object corresponding to each text to be processed.
6. The method according to claim 5, wherein the associating the target objects corresponding to the texts to be processed with the same object comprises:
determining two target objects to be judged from target objects corresponding to the texts to be processed, wherein the two target objects to be judged are respectively contained in different texts to be processed;
judging whether the two target objects to be judged are matched or not;
and if the two target objects to be judged are matched, determining that the two target objects to be judged are the same object.
7. The method of claim 6, wherein the determining whether the two target objects to be determined match comprises:
and processing the two target objects to be judged by using a matching judgment model to obtain a judgment result of whether the two target objects to be judged, which are output by the matching judgment model, are matched, wherein the matching judgment model is obtained by taking a target object pair as a training sample and taking a judgment result of whether the target object pair is labeled to be matched as a sample label for training.
8. The method according to claim 7, wherein the processing of the two target objects to be determined by using the matching determination model to obtain a determination result of whether the two target objects to be determined output by the matching determination model match comprises:
comparing the two target objects to be judged by utilizing a first matching judgment module of the matching judgment model to obtain a first matching judgment result;
comparing the same object attributes in the two target objects to be judged by utilizing a second matching judgment module of the matching judgment model to obtain a second matching judgment result;
and determining whether the two target objects to be determined are matched or not based on the first matching determination result and the second matching determination result by using a comprehensive matching determination module of the matching determination model.
9. A text processing apparatus, comprising:
the acquisition unit is used for acquiring a text to be processed;
the object set determining unit is used for determining an object set contained in the text to be processed;
and the target object determining unit is used for determining the attribute corresponding to each object in the object set and combining the attribute with the object to obtain the target object.
10. A text processing apparatus comprising a memory and a processor;
the memory is used for storing programs;
the processor, configured to execute the program, implementing the steps of the text processing method according to any one of claims 1 to 8.
11. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the text processing method according to any one of claims 1 to 8.
CN202010656329.9A 2020-07-09 2020-07-09 Text processing method, related equipment and readable storage medium Active CN111814461B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010656329.9A CN111814461B (en) 2020-07-09 2020-07-09 Text processing method, related equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010656329.9A CN111814461B (en) 2020-07-09 2020-07-09 Text processing method, related equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111814461A true CN111814461A (en) 2020-10-23
CN111814461B CN111814461B (en) 2024-05-31

Family

ID=72843145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010656329.9A Active CN111814461B (en) 2020-07-09 2020-07-09 Text processing method, related equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111814461B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040006457A1 (en) * 2002-07-05 2004-01-08 Dehlinger Peter J. Text-classification system and method
US20100150448A1 (en) * 2008-12-17 2010-06-17 Xerox Corporation Method of feature extraction from noisy documents
JP2011159078A (en) * 2010-01-29 2011-08-18 Fujitsu Ltd Information processing apparatus, determination program and determination method
CN102866989A (en) * 2012-08-30 2013-01-09 北京航空航天大学 Viewpoint extracting method based on word dependence relationship
US20140343923A1 (en) * 2013-05-16 2014-11-20 Educational Testing Service Systems and Methods for Assessing Constructed Recommendations
CN109800414A (en) * 2018-12-13 2019-05-24 科大讯飞股份有限公司 Faulty wording corrects recommended method and system
CN110069631A (en) * 2019-04-08 2019-07-30 腾讯科技(深圳)有限公司 A kind of text handling method, device and relevant device
CN110210032A (en) * 2019-05-31 2019-09-06 北京神州泰岳软件股份有限公司 Text handling method and device
CN110348012A (en) * 2019-07-01 2019-10-18 北京明略软件系统有限公司 Determine method, apparatus, storage medium and the electronic device of target character
CN110532558A (en) * 2019-08-29 2019-12-03 杭州涂鸦信息技术有限公司 A kind of more intension recognizing methods and system based on the parsing of sentence structure deep layer
CN110569500A (en) * 2019-07-23 2019-12-13 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110598206A (en) * 2019-08-13 2019-12-20 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110597082A (en) * 2019-10-23 2019-12-20 北京声智科技有限公司 Intelligent household equipment control method and device, computer equipment and storage medium
CN110765235A (en) * 2019-09-09 2020-02-07 深圳市人马互动科技有限公司 Training data generation method and device, terminal and readable medium
CN111128394A (en) * 2020-03-26 2020-05-08 腾讯科技(深圳)有限公司 Medical text semantic recognition method and device, electronic equipment and readable storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040006457A1 (en) * 2002-07-05 2004-01-08 Dehlinger Peter J. Text-classification system and method
US20100150448A1 (en) * 2008-12-17 2010-06-17 Xerox Corporation Method of feature extraction from noisy documents
JP2011159078A (en) * 2010-01-29 2011-08-18 Fujitsu Ltd Information processing apparatus, determination program and determination method
CN102866989A (en) * 2012-08-30 2013-01-09 北京航空航天大学 Viewpoint extracting method based on word dependence relationship
US20140343923A1 (en) * 2013-05-16 2014-11-20 Educational Testing Service Systems and Methods for Assessing Constructed Recommendations
CN109800414A (en) * 2018-12-13 2019-05-24 科大讯飞股份有限公司 Faulty wording corrects recommended method and system
CN110069631A (en) * 2019-04-08 2019-07-30 腾讯科技(深圳)有限公司 A kind of text handling method, device and relevant device
CN110210032A (en) * 2019-05-31 2019-09-06 北京神州泰岳软件股份有限公司 Text handling method and device
CN110348012A (en) * 2019-07-01 2019-10-18 北京明略软件系统有限公司 Determine method, apparatus, storage medium and the electronic device of target character
CN110569500A (en) * 2019-07-23 2019-12-13 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110598206A (en) * 2019-08-13 2019-12-20 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110532558A (en) * 2019-08-29 2019-12-03 杭州涂鸦信息技术有限公司 A kind of more intension recognizing methods and system based on the parsing of sentence structure deep layer
CN110765235A (en) * 2019-09-09 2020-02-07 深圳市人马互动科技有限公司 Training data generation method and device, terminal and readable medium
CN110597082A (en) * 2019-10-23 2019-12-20 北京声智科技有限公司 Intelligent household equipment control method and device, computer equipment and storage medium
CN111128394A (en) * 2020-03-26 2020-05-08 腾讯科技(深圳)有限公司 Medical text semantic recognition method and device, electronic equipment and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SIJUN QIN等: "Feature selection for text classification based on part of speech filter and synonym merge", IEEE *
伍大勇;赵世奇;刘挺;张宇;: "融合多类特征的Web查询意图识别", 模式识别与人工智能, no. 03 *

Also Published As

Publication number Publication date
CN111814461B (en) 2024-05-31

Similar Documents

Publication Publication Date Title
CN109947909B (en) Intelligent customer service response method, equipment, storage medium and device
CN109033229B (en) Question and answer processing method and device
CN110334217B (en) Element extraction method, device, equipment and storage medium
CN109872162B (en) Wind control classification and identification method and system for processing user complaint information
CN111159385A (en) Template-free universal intelligent question-answering method based on dynamic knowledge graph
CN108334489B (en) Text core word recognition method and device
CN113886604A (en) Job knowledge map generation method and system
CN111382248A (en) Question reply method and device, storage medium and terminal equipment
CN112507709B (en) Document matching method, electronic equipment and storage device
CN110765889A (en) Legal document feature extraction method, related device and storage medium
CN114218945A (en) Entity identification method, device, server and storage medium
CN111815108A (en) Evaluation method for power grid engineering design change and on-site visa approval sheet
CN109657043B (en) Method, device and equipment for automatically generating article and storage medium
CN114443842A (en) Strategic emerging industry classification method and device, storage medium and electronic equipment
CN112765965A (en) Text multi-label classification method, device, equipment and storage medium
CN115098629B (en) File processing method, device, server and readable storage medium
CN111198943B (en) Resume screening method and device and terminal equipment
CN111814461A (en) Text processing method, related device and readable storage medium
CN115952770A (en) Data standardization processing method and device, electronic equipment and storage medium
CN114741494A (en) Question answering method, device, equipment and medium
CN113204710A (en) Public opinion analysis method and device, terminal equipment and storage medium
Swaileh et al. A named entity extraction system for historical financial data
CN114328930A (en) Text classification method and system based on entity extraction
CN114036297A (en) Statement classification method and device, terminal equipment and storage medium
CN110825847B (en) Method and device for identifying intimacy between target people, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant