CN113761880B - Data processing method for text verification, electronic equipment and storage medium - Google Patents

Data processing method for text verification, electronic equipment and storage medium Download PDF

Info

Publication number
CN113761880B
CN113761880B CN202111310983.5A CN202111310983A CN113761880B CN 113761880 B CN113761880 B CN 113761880B CN 202111310983 A CN202111310983 A CN 202111310983A CN 113761880 B CN113761880 B CN 113761880B
Authority
CN
China
Prior art keywords
text
data
target
list
verification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111310983.5A
Other languages
Chinese (zh)
Other versions
CN113761880A (en
Inventor
刘远
陈旻晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Clp Suzhou Shared Services Co ltd
Beijing Zhongdian Huizhi Technology Co ltd
Original Assignee
Clp Suzhou Shared Services Co ltd
Beijing Zhongdian Huizhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clp Suzhou Shared Services Co ltd, Beijing Zhongdian Huizhi Technology Co ltd filed Critical Clp Suzhou Shared Services Co ltd
Priority to CN202111310983.5A priority Critical patent/CN113761880B/en
Publication of CN113761880A publication Critical patent/CN113761880A/en
Application granted granted Critical
Publication of CN113761880B publication Critical patent/CN113761880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/226Validation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data processing method, electronic equipment and a storage medium for text verification, wherein the method comprises the following steps: obtaining a sample text list from a text database, marking a keyword position of the sample text as a designated initial position and a finishing position of the sample text as a designated finishing position when a keyword consistent with any preset keyword in the preset keyword list exists in any sample text, and taking a speech segment between the designated initial position and the designated finishing position as a target speech segment, and taking the sample text based on the existing target speech segment as training set data to construct a training set; inputting the training set into a preset language model for training to obtain a trained language model; and acquiring the knowledge graph of the target text through the trained language model so as to compare the knowledge graph with preset verification data. The method and the device can improve the accuracy and efficiency of comparing the structured text data with the semi-structured text data.

Description

Data processing method for text verification, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a data processing method for text verification, electronic equipment and a storage medium.
Background
In the prior art, text data is divided into three types: structured text data, random text data and semi-structured text data; in the structured text data, the text data at a specific position has a specific meaning and is easy to be converted into a table structure in a relational database, such as text data in a cvs format, invoice text data after OCR processing or settlement statement data in a specific field of a power system; in the random text data, the text data at each text position has a random meaning, for example, the text data of literary works such as news, novels, prose and the like spread on the internet; the semi-structured text data is intermediate between the structured text data and the random text data, and text data at a specific position may have a specific meaning, but is difficult to be converted into a table structure in a relational database, for example, settlement terms in a contract in a specific field such as an electric power system, and the like.
In some application scenarios, especially in a settlement auditing scenario of an electric power system, structured text data and semi-structured text data need to be compared, that is, whether the structured data in a settlement document meets the requirements of the semi-structured settlement terms in a contract or not is judged, but because the semi-structured text data is difficult to be converted into a table structure of a relational database, the efficiency and accuracy of data comparison are low due to the fact that the semi-structured text data is compared in a manual mode in the prior art, and the data verification process is affected.
Disclosure of Invention
In order to solve the above technical problems, the present application adopts a technical solution of a data processing method, an electronic device, and a storage medium for text verification, where the method includes the steps of:
s100, acquiring m first texts from a first text set of a text database as sample texts, and constructing a sample text list A = (A)1,A2,A3,……,Am),AiI =1 … … m, and when A is the ith sample textiWhen the keyword exists in the preset keyword list, the A is matched with any preset keyword in the preset keyword listiIs marked as specifying a starting position and AiIs marked as a specified end position, and the speech segment between the specified start position and the specified end position is taken as AiBased on the presence of A of the target language fragmentiConstructing a training set as training set data;
s200, inputting the training set into a preset language model for training to obtain a trained language model;
s300, obtaining a target text, inputting the target text into a trained language model, and obtaining a target data list B = (B) corresponding to the target text1,B2,B3,……,Bn),BjJ =2 … … n, n is the target data number, and each B in B isjWith a number of preset ternary groupsThe frame is used for acquiring a target knowledge graph corresponding to the target text;
s400, acquiring a text ID of a target text, acquiring all verification data corresponding to the text ID of the target text from a verification data list according to the text ID of the target text, and constructing a first intermediate data list by taking each verification data as first intermediate data;
s500, traversing the target knowledge graph, and replacing any target data in the target knowledge graph with corresponding target data when the target data is inconsistent with the corresponding first intermediate data in the first intermediate data list.
The present invention also provides a non-transitory computer-readable storage medium that can be configured in an electronic device to store at least one instruction or at least one program for implementing a method of the method embodiments, where the at least one instruction or the at least one program is loaded by a processor and executed to implement the method provided by the above embodiments.
The invention also provides an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.
Compared with the prior art, the invention has obvious advantages and beneficial effects. By the technical scheme, the data processing system for acquiring the target position can achieve considerable technical progress and practicability, has wide industrial utilization value and at least has the following advantages:
the method comprises the steps of obtaining a sample text list, marking a keyword position of the sample text as a designated initial position and marking an end position of the sample text as a designated end position when a keyword consistent with any preset keyword in the preset keyword list exists in the sample text, taking a speech segment between the designated initial position and the designated end position as a target speech segment of the sample text, and taking the sample text based on the target speech segment as training set data to construct a training set; inputting the training set into a preset language model for training to obtain a trained language model;
the language model is optimized, the target language segment capable of extracting the data with the specific meaning can be accurately and efficiently determined, the extraction of the full text data and the interference of other data are reduced, and the comparison of the data in the text is facilitated;
meanwhile, inputting a target text into a trained language model, acquiring a characteristic value list corresponding to the target text, and acquiring a target knowledge graph corresponding to the target text by constructing each characteristic value by a plurality of preset triples; the data in the semi-structured text can be stored in a knowledge map form, the storage mode is optimized, the comparison of the data in the text is facilitated, and the efficiency and the accuracy of the verification of the structured text data and the semi-structured text data are improved.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.
Drawings
Fig. 1 is a flowchart of a data processing method for text verification according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description will be given with reference to the accompanying drawings and preferred embodiments of a data processing method, an electronic device and a storage medium for text verification according to the present invention.
The embodiment of the invention provides a data processing method for text verification, which further comprises the following steps, as shown in fig. 1:
s100, acquiring m first texts from a first text set of a text database as sample texts, and constructing a sample text list A = (A)1,A2,A3,……,Am),AiI =1 … … m, and when the text is the ith sample textAiWhen the keyword exists in the preset keyword list, the A is matched with any preset keyword in the preset keyword listiIs marked as specifying a starting position and AiIs marked as a specified end position, and the speech segment between the specified start position and the specified end position is taken as AiBased on the presence of A of the target language fragmentiAs training set data, a training set is constructed.
Specifically, the method further includes the following steps before the step S100:
the text types of all the first texts are obtained, and the first texts of the same type are classified according to preset text division rules to construct a plurality of first text sets.
Preferably, the text division rule refers to a preset rule for dividing the text by the text type of the first text, where the text type of the first text is, for example, a purchase text, a statistical text, or an order text.
Specifically, the first text is a text storing semi-structured text data, wherein all sample texts in a constructed based on the first text set are texts of the same type, so that a preset language model can be conveniently trained, the accuracy of model training is improved, and the accuracy and efficiency of comparison between the structured text data and the semi-structured text data are improved.
Specifically, in step S100, AiThe keywords in the text are determined by a natural language processing method, and the keywords can be extracted from the sample text to determine the language segments capable of obtaining the key data, so that the accuracy and efficiency of comparing the structured text data with the semi-structured text data are improved.
Preferably, the preset keyword list is a preset keyword list, and the keyword list field includes a keyword corresponding to a text type of any of the first texts, which can be understood as follows: at step S100In step, traverse AiAnd according to AiText type, obtaining A from preset keyword listiAll preset keywords corresponding to the text type are used as target keywords to obtain AiThe comparison of the keywords with all the target keywords can be facilitated, the comparison of the keywords in the sample text can be facilitated, the language segment capable of obtaining the key data can be determined, and the accuracy and the efficiency of the comparison of the structured text data and the semi-structured text data are improved.
Specifically, the key data refers to data with a local special meaning in the sample text, and the special meaning needs to be determined according to the text type, which is not described herein again.
S200, inputting the training set into a preset language model for training to obtain a trained language model.
Specifically, the step S200 further includes the steps of:
s201, concentrating the training set AiInputting the data into a preset language model to obtain AiCorresponding key data are constructed into a key data list SiIn this embodiment, a method for obtaining a feature value by any language model in the art may be adopted, which is not described herein again;
s203, obtaining AiCorresponding text ID, and according to AiCorresponding text ID, obtaining A from the verification data listiConstructing a second intermediate data list by using all the verification data of the corresponding text ID as second intermediate data;
s205, traverse AiCorresponding key data list and according to said AiCorresponding Key data List and AiAnd determining the probability value F of the A by the corresponding second intermediate data list, wherein the F meets the following conditions:
Figure 686718DEST_PATH_IMAGE001
wherein S isiIs the AiThe amount of critical data in the corresponding list of critical data,
Figure 541542DEST_PATH_IMAGE002
is the AiThe number of data in the corresponding key data list which is inconsistent with the corresponding second intermediate data in the second intermediate data list;
s207, traversing A, and obtaining a trained language model when F is larger than or equal to a preset probability threshold;
s209, when F is less than the preset probability threshold, the sample text list is obtained again
Figure 491918DEST_PATH_IMAGE003
According to
Figure 824810DEST_PATH_IMAGE004
Iterating until F is larger than or equal to a preset probability threshold value to obtain a trained language model, wherein the iteration process is based on
Figure 311286DEST_PATH_IMAGE005
After the step S100 processing is executed, the method reacquires
Figure 71432DEST_PATH_IMAGE005
The corresponding probability process is not described herein again.
Further, the text ID refers to a unique identification for identifying the text.
Preferably, the language model is a Bert model.
Preferably, in the step S209,
Figure 135203DEST_PATH_IMAGE005
the same sample text as a can be further understood as: requiring re-acquisition when retraining the language model
Figure 770321DEST_PATH_IMAGE006
Is the same text type as A, and
Figure 111304DEST_PATH_IMAGE005
including AiCorresponding probability FiSample text < preset probability threshold and not including AiCorresponding probability FiSample text of a probability threshold value, wherein FiThe following conditions are met:
Figure 776772DEST_PATH_IMAGE007
further, the probability threshold range is 90-98%, preferably, the probability threshold is 90%.
In another specific embodiment, the method comprises the following steps:
obtaining the same sample text list A, and collecting A in the training setiInputting the data into a preset language model to obtain AiCorresponding key data are constructed into a key data list;
obtaining AiCorresponding text ID, and according to AiCorresponding text ID, obtaining A from the verification data listiConstructing a second intermediate data list by using all the verification data of the corresponding text ID as second intermediate data;
traverse AiCorresponding key data list and according to said AiCorresponding Key data List and AiCorresponding to the second intermediate data list, determining the probability value of A
Figure 100002_DEST_PATH_IMAGE008
As can be seen from the large amount of experimental data obtained by the method of the above embodiment, in the case of using the same sample text list,
Figure 639423DEST_PATH_IMAGE008
compared with F, the number of the target language segments is reduced by at least 10%, namely F corresponding to target language segment marking of the sample text is not reduced by 10% and F corresponding to target language segment marking of the sample text, so that the method can further explain that the extraction of the full text data and the interference of other data are reduced by checking the determination of the target language segments in the implementation, and the comparison of the data in the text is facilitated.
S300, obtaining the eyesMarking a text and inputting the target text into a trained language model, and acquiring a target data list B = (B) corresponding to the target text1,B2,B3,……,Bn),BjJ =2 … … n, n is the target data number, and each B in B isjAnd acquiring a target knowledge graph corresponding to the target text by using a plurality of preset triple frameworks.
Specifically, the step S300 further includes the steps of:
all B arejInserting the target texts into each preset triple framework as entities to construct a plurality of knowledge graphs of the target texts, and inserting the maximum quantity B into the knowledge graphs of the target textsjThe target knowledge graph is understood as follows: each text type of the first text corresponds to a plurality of preset triad frameworks, and B is used for determining the type of the first textjThe constructed knowledge graph is used as a target knowledge graph, so that a suitable knowledge graph can be quickly constructed to store data, and meanwhile, comparison between the knowledge graph and verification data is facilitated, namely comparison between semi-structured text data and structured text data; the target data refers to data with special meaning in the target text, and the special meaning needs to be determined according to the text type, which is not described herein again.
Specifically, the target text refers to any first text in the text database except the sample text, and the target text is consistent with the text type of the sample text in the training set used for training the language model, which can be understood as: the target text is consistent with the text types of all sample texts in A, and meanwhile, the target text does not need to mark the starting position of a speech fragment.
S400, acquiring the text ID of the target text, acquiring all verification data corresponding to the text ID of the target text from a verification data list according to the text ID of the target text, and constructing a first intermediate data list by taking each verification data as first intermediate data.
Specifically, the step S400 further includes the steps of:
according to the text ID of the first text, a plurality of second texts corresponding to the text ID of the first text are obtained from a text database, all the second texts are preprocessed, designated data are obtained from the second texts and serve as verification data of the first text, a verification data list is constructed according to the verification data of all the first texts and the text ID of the first text, the second text is a text which records data corresponding to the data used for verifying the first text, and the second text is a structured text.
S500, traversing the target knowledge graph, and replacing any target data in the target knowledge graph with corresponding target data when the target data is inconsistent with the corresponding first intermediate data in the first intermediate data list.
Specifically, the step S500 further includes the steps of:
s501, traversing the target knowledge graph and acquiring target data corresponding to each entity in a target triple framework from the target knowledge graph, wherein the target triple framework in the step S501 refers to the triple framework corresponding to the target knowledge graph;
s502, according to the entity of the target triple structure, obtaining first intermediate data corresponding to the entity from the first intermediate data list, which can be understood as: the entity in the target triple structure is a field name in a check data list;
s503, comparing the target data with the corresponding first intermediate data;
and S505, when the target data is inconsistent with the corresponding first intermediate data, replacing the first intermediate data with the corresponding target data.
In the embodiment, the comparison of the structured data to the semi-structured data can be realized, and the efficiency and the accuracy of the verification of the structured data to the semi-structured data are improved.
The method comprises the steps that a sample text list is obtained, when a keyword consistent with any preset keyword in the preset keyword list exists in a sample text, the position of the keyword of the sample text is marked as a specified starting position, the end position of the sample text is marked as a specified end position, a speech segment between the specified starting position and the specified end position is used as a target speech segment of the sample text, and the sample text based on the target speech segment exists as training set data to construct a training set; the training set is input into a preset language model for training to obtain a trained language model, so that the language model is optimized, a target language segment capable of extracting specific meaning data can be accurately and efficiently determined, extraction of full text data and interference of other data are reduced, and comparison of data in a text is facilitated.
Meanwhile, the target text is input into the trained language model, the characteristic value list corresponding to the target text is obtained, each characteristic value is constructed by a plurality of preset triples, the target knowledge graph corresponding to the target text is obtained, data in the semi-structured text can be stored in the form of the knowledge graph, the storage mode is optimized, comparison of the data in the text is facilitated, and the efficiency and accuracy of data verification are improved.
Embodiments of the present application also provide a non-transitory computer-readable storage medium that can be disposed in an electronic device to store at least one instruction or at least one program for implementing a method of the method embodiments, where the at least one instruction or the at least one program is loaded into and executed by a processor to implement the method provided by the above embodiments.
Embodiments of the present application also provide an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A data processing method for text verification, the method further comprising the steps of:
s100, acquiring m first texts from a first text set of a text database as sample texts, and constructing a sample text list A = (A)1,A2,A3,……,Am),AiI =1 … … m, and when A is the ith sample textiWhen the keyword exists in the preset keyword list, the A is matched with any preset keyword in the preset keyword listiIs marked as specifying a starting position and AiIs marked as a specified end position, and the speech segment between the specified start position and the specified end position is taken as AiBased on the presence of A of the target language fragmentiConstructing a training set as training set data, wherein the first text refers to a text storing semi-structured data;
s200, inputting the training set into a preset language model for training to obtain a trained language model, wherein the step S200 further comprises the following steps:
s201, concentrating the training set AiInputting the data into a preset language model to obtain AiCorresponding key data are constructed into a key data list Si
S203, obtaining AiCorresponding text ID, and according to AiCorresponding text ID, obtaining A from the verification data listiConstructing a second intermediate data list by using all the verification data of the corresponding text ID as second intermediate data;
s205, traverse AiCorresponding key data list andaccording to said AiCorresponding Key data List and AiAnd determining the probability value F of the A by the corresponding second intermediate data list, wherein the F meets the following conditions:
Figure 279823DEST_PATH_IMAGE002
wherein S isiIs the AiThe amount of critical data in the corresponding list of critical data,
Figure 774389DEST_PATH_IMAGE004
is the AiThe number of data in the corresponding key data list which is inconsistent with the corresponding second intermediate data in the second intermediate data list;
s207, traversing A, and obtaining a trained language model when F is larger than or equal to a preset probability threshold;
s209, when F is less than the preset probability threshold, the sample text list is obtained again
Figure DEST_PATH_IMAGE006
According to
Figure 117121DEST_PATH_IMAGE006
Performing iteration until F is larger than or equal to a preset probability threshold value to obtain a trained language model;
the step S209 includes:
Figure 253705DEST_PATH_IMAGE006
may have the same sample text as A, and need to be retrieved when the language model is retrained
Figure 860266DEST_PATH_IMAGE006
Is the same text type as A, and
Figure 791313DEST_PATH_IMAGE006
including AiCorresponding probability FiSamples of < Preset probability thresholdText and not including AiCorresponding probability FiSample text that is greater than or equal to a preset probability threshold, wherein,
Fithe following conditions are met:
Figure DEST_PATH_IMAGE008
s300, obtaining a target text, inputting the target text into a trained language model, and obtaining a target data list B = (B) corresponding to the target text1,B2,B3,……,Bn),BjJ =2 … … n, n is the target data number, and each B in B isjAcquiring a target knowledge graph corresponding to the target text by using a plurality of preset triple frameworks;
s400, acquiring a text ID of a target text, acquiring all verification data corresponding to the text ID of the target text from a verification data list according to the text ID of the target text, and constructing a first intermediate data list by taking each verification data as first intermediate data, wherein the target text refers to any first text except a sample text in a text database;
wherein, the step of S400 further comprises the following steps: according to the text ID of the first text, acquiring a plurality of second texts corresponding to the text ID of the first text from a text database, preprocessing all the second texts, acquiring designated data from the second texts to serve as verification data of the first text, and constructing a verification data list according to the verification data of all the first texts and the text ID of the first text, wherein the second text is a text recorded with data corresponding to the data for verifying the first text, and the second text is a structured text;
s500, traversing the target knowledge graph, and replacing any target data in the target knowledge graph with corresponding target data when the target data is inconsistent with the corresponding first intermediate data in the first intermediate data list.
2. The data processing method for text verification according to claim 1, wherein in step S100, aiThe keywords in (1) are determined by a natural language processing method.
3. The data processing method for text verification according to claim 1, further comprising the following steps in the step S300:
all B arejInserting the target texts into each preset triple framework as entities to construct a plurality of knowledge graphs of the target texts, and inserting the maximum quantity B into the knowledge graphs of the target textsjThe target knowledge graph is used as the knowledge graph of (1).
4. The data processing method for text verification according to claim 1, wherein the target text refers to any first text in the text database except the sample text.
5. The data processing method for text verification according to claim 1, further comprising the following steps in the step S400:
according to the text ID of the first text, a plurality of second texts corresponding to the text ID of the first text are obtained from a text database, all the second texts are preprocessed, key data are extracted to serve as verification data of the first text, and a verification data list is constructed according to the verification data of all the first texts and the text ID of the first text.
6. The data processing method for text verification according to claim 5, wherein the second text is a text corresponding to the data recorded for verifying the first text.
7. A non-transitory computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the method of any of claims 1-6.
8. An electronic device comprising a processor and the non-transitory computer readable storage medium of claim 7.
CN202111310983.5A 2021-11-08 2021-11-08 Data processing method for text verification, electronic equipment and storage medium Active CN113761880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111310983.5A CN113761880B (en) 2021-11-08 2021-11-08 Data processing method for text verification, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111310983.5A CN113761880B (en) 2021-11-08 2021-11-08 Data processing method for text verification, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113761880A CN113761880A (en) 2021-12-07
CN113761880B true CN113761880B (en) 2022-03-04

Family

ID=78784725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111310983.5A Active CN113761880B (en) 2021-11-08 2021-11-08 Data processing method for text verification, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113761880B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168608B (en) * 2021-12-16 2022-07-15 中科雨辰科技有限公司 Data processing system for updating knowledge graph
CN114297653B (en) * 2021-12-31 2024-09-13 安天科技集团股份有限公司 De-duplication method for derivative data
CN114021200B (en) * 2022-01-07 2022-04-15 每日互动股份有限公司 Data processing system for pkg fuzzification
CN115858208B (en) * 2022-09-29 2024-05-14 杭州中电安科现代科技有限公司 Method for acquiring target data and extracting text list
CN115544974A (en) * 2022-11-28 2022-12-30 药融云数字科技(成都)有限公司 Text data extraction method, system, storage medium and terminal

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200364233A1 (en) * 2019-05-15 2020-11-19 WeR.AI, Inc. Systems and methods for a context sensitive search engine using search criteria and implicit user feedback
CN111753086A (en) * 2020-06-11 2020-10-09 北京天空卫士网络安全技术有限公司 Junk mail identification method and device
CN112860872B (en) * 2021-03-17 2024-06-28 广东电网有限责任公司 Power distribution network operation ticket semantic compliance verification method and system based on self-learning
CN113239208A (en) * 2021-05-06 2021-08-10 广东博维创远科技有限公司 Mark training model based on knowledge graph
CN113254667A (en) * 2021-06-07 2021-08-13 成都工物科云科技有限公司 Scientific and technological figure knowledge graph construction method and device based on deep learning model and terminal

Also Published As

Publication number Publication date
CN113761880A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN113761880B (en) Data processing method for text verification, electronic equipment and storage medium
WO2019174132A1 (en) Data processing method, server and computer storage medium
WO2019091026A1 (en) Knowledge base document rapid search method, application server, and computer readable storage medium
JP5785617B2 (en) Method and arrangement for handling data sets, data processing program and computer program product
EP3819785A1 (en) Feature word determining method, apparatus, and server
US10163063B2 (en) Automatically mining patterns for rule based data standardization systems
US9852122B2 (en) Method of automated analysis of text documents
CN107102993B (en) User appeal analysis method and device
US20200364216A1 (en) Method, apparatus and storage medium for updating model parameter
CN108153728B (en) Keyword determination method and device
CN112183102A (en) Named entity identification method based on attention mechanism and graph attention network
CN114780746A (en) Knowledge graph-based document retrieval method and related equipment thereof
CN111209373A (en) Sensitive text recognition method and device based on natural semantics
CN107958068B (en) Language model smoothing method based on entity knowledge base
CN114266256A (en) Method and system for extracting new words in field
CN109344233B (en) Chinese name recognition method
CN103440292A (en) Method and system for retrieving multimedia information based on bit vector
CN110909532B (en) User name matching method and device, computer equipment and storage medium
CN110888977B (en) Text classification method, apparatus, computer device and storage medium
CN113420564B (en) Hybrid matching-based electric power nameplate semantic structuring method and system
CN113343051B (en) Abnormal SQL detection model construction method and detection method
CN112989040B (en) Dialogue text labeling method and device, electronic equipment and storage medium
CN111341404B (en) Electronic medical record data set analysis method and system based on ernie model
WO2021056740A1 (en) Language model construction method and system, computer device and readable storage medium
CN105824871A (en) Picture detecting method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant