CN116205235B - Data set dividing method and device and electronic equipment - Google Patents

Data set dividing method and device and electronic equipment Download PDF

Info

Publication number
CN116205235B
CN116205235B CN202310491927.9A CN202310491927A CN116205235B CN 116205235 B CN116205235 B CN 116205235B CN 202310491927 A CN202310491927 A CN 202310491927A CN 116205235 B CN116205235 B CN 116205235B
Authority
CN
China
Prior art keywords
entity
data set
group
dividing
location
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310491927.9A
Other languages
Chinese (zh)
Other versions
CN116205235A (en
Inventor
宋洒
卢文庆
郭文萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Velocity Insight Technology Co ltd
Original Assignee
Beijing Velocity Insight Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Velocity Insight Technology Co ltd filed Critical Beijing Velocity Insight Technology Co ltd
Priority to CN202310491927.9A priority Critical patent/CN116205235B/en
Publication of CN116205235A publication Critical patent/CN116205235A/en
Application granted granted Critical
Publication of CN116205235B publication Critical patent/CN116205235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application discloses a data set dividing method, a data set dividing device and electronic equipment, and relates to the technical field of medical treatment, wherein the method comprises the following steps: acquiring a data set, and labeling entities in a text unit; classifying the data set according to the entity types, and dividing the data set into a plurality of entity type groups; recording the starting position and the ending position of each entity in the text unit; dividing the data set into a plurality of entity location groups according to the location information; counting the proportion of each entity position group in the data set; for each entity type group, the grouping is completed by dividing the entity location groups into a plurality of subgroups in combination with a plurality of entity location groups. The method can count the probability that the entity types and the entity values thereof appear in different positions in sentences in the data set, group the data set according to the entity types, further divide the data set into subgroups according to the position information of the entities, sample the packets according to the probability of the position groups, and ensure that the distribution of the training set and the testing set in the aspects of the entity types, the contexts and the position information is more balanced.

Description

Data set dividing method and device and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of medical treatment, in particular to a data set dividing method, a data set dividing device and electronic equipment.
Background
In the task of identifying named entities in the medical field, there are often cases where many entities exist in the word ambiguous, i.e. the values of the entities are the same, e.g. when the names of the entities are the same, the entities may be of various types, e.g. the word "immune" may in some contexts denote immunotherapy and in other contexts denote autoimmune diseases. In order to realize the task of identifying the ambiguous named entity in the medical field, it is important to accurately distinguish the entity types and accurately divide the data set.
The traditional data set division only introduces the characteristics of entity types, so that the distribution proportion of a plurality of entity types in the training set and the testing set is ensured to be basically equal. However, since some entity values may represent different entity types in different contexts, conventional data set partitioning methods may partition one type of the same entity value into a training set and partition another type of the same entity value into a test set, resulting in uneven data distribution.
The location of the entity value in the specific text may be different, and the location of the specific text may be different paragraphs of the article, such as a leading edge, a trailing portion, a sentence of a leading portion and a sentence of a trailing portion of the same paragraph, or a beginning and a trailing portion within the same sentence. Therefore, if the data sets of training and testing are divided according to the entity types only, the relative position information of the entity values in the text is not considered, so that the data sets are unevenly distributed to influence the model evaluation index, and the word ambiguous named entity recognition task is difficult to accurately realize.
Disclosure of Invention
The embodiment of the application provides a data set dividing method, a data set dividing device and electronic equipment, which can solve the problem of uneven data distribution.
In a first aspect, an embodiment of the present application provides a data set partitioning method, where the method includes:
acquiring a data set, wherein the data set comprises a plurality of text units, and labeling entities in the text units;
classifying the data set according to entity types, and dividing the data set into a plurality of entity type groups;
recording the starting position and the ending position of each entity in the text unit to obtain the position information of each entity;
dividing the dataset into a plurality of entity location groups according to the location information;
counting the proportion of each entity position group in the data set;
for each entity type group, dividing the plurality of entity location groups into a plurality of subgroups in combination to complete grouping.
In one possible design, the text unit includes: chapters, paragraphs, and sentences.
In one possible design, the location information for each entity includes a start location, an end location, and a length of an entity name for the entity.
In one possible design, the dividing the data set into a plurality of entity location groups according to the location information includes:
grouping each entity according to the position information of the entity;
the set of entity locations for each text unit is determined based on the grouping of all entities that each text unit includes.
In one possible design, the grouping the entities includes:
dividing the corresponding entity into a head group if the position information indicates that the entity appears in the part before one third of the text unit;
dividing the corresponding entity into middle groups if the location information indicates that the entity appears in one third to two thirds of the text units;
if the location information indicates that the corresponding entity appears in the two-thirds later portion of the text unit, the entity is divided into ending groups.
In one possible design, the set of physical locations includes: a beginning group, an intermediate group, an ending group, a beginning intermediate group, a beginning ending group, an intermediate ending group, a beginning intermediate ending group.
In one possible design, the method further comprises:
and dividing a training set and a testing set according to a preset proportion and the proportion of each entity position group aiming at each entity type group, so that the proportion of text units of each entity position group is the same in the training set and the testing set.
In a second aspect, embodiments of the present application provide an apparatus, the apparatus comprising:
the receiving module is used for acquiring a data set, wherein the data set comprises a plurality of text units and labeling entities in the text units;
the processing module is used for classifying the data set according to entity types and dividing the data set into a plurality of entity type groups; recording the starting position and the ending position of each entity in the text unit to obtain the position information of each entity; dividing the dataset into a plurality of entity location groups according to the location information; counting the proportion of each entity position group in the data set; for each entity type group, dividing the plurality of entity location groups into a plurality of subgroups in combination to complete grouping.
In a third aspect, embodiments of the present application provide an electronic device including a memory and one or more processors; wherein the memory is for storing computer program code, the computer program code comprising computer instructions; the computer instructions, when executed by the processor, cause the electronic device to perform part or all of the steps of the method of the first aspect or in various possible implementations of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer storage medium having instructions stored therein which, when executed on a computer, cause the computer to perform some or all of the steps of the method of the first aspect or in various possible implementations of the first aspect.
The application provides a data set dividing method, which comprises the following steps: acquiring a data set, wherein the data set comprises a plurality of text units, and labeling entities in the text units; classifying the data set according to entity types, and dividing the data set into a plurality of entity type groups; recording the starting position and the ending position of each entity in the text unit to obtain the position information of each entity; dividing the dataset into a plurality of entity location groups according to the location information; counting the proportion of each entity position group in the data set; for each entity type group, dividing the plurality of entity location groups into a plurality of subgroups in combination to complete grouping. The method comprises the steps of recording the positions of entities to obtain position information of all the entities, dividing groups according to the positions, and dividing a data set into a plurality of different subgroups according to the types of the entities to provide a data set classified according to the types of the entities and the positions of the entities. Each subgroup contains an entity of a specific entity type and a specific location, thereby enabling a hierarchical sampling strategy such that the data of the training set and the testing set are distributed more evenly in terms of entity type, context and location information.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a flowchart of a data set partitioning method provided in an embodiment of the present application;
fig. 2 is a schematic structural diagram of a data set dividing apparatus according to an embodiment of the present application;
fig. 3 is an exemplary structural schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions of the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The terminology used in the following embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include the plural forms as well, unless the context clearly indicates to the contrary. It should also be understood that, although the terms first, second, etc. may be used in the following embodiments to describe certain types of objects, the objects should not be limited to these terms. These terms are only used to distinguish between specific objects of that class of objects. For example, the terms first, second, etc. may be used in the following embodiments to describe an entity, but the entity should not be limited to these terms. These terms are only used to distinguish between different entities. Other classes of objects that may be described in the following embodiments using the terms first, second, etc. are not described here again.
Embodiments of the present application relate to the medical field.
The embodiment of the application provides a data set dividing method, a data set dividing device and electronic equipment.
The data set partitioning method according to the embodiments of the present application is described below through several implementations.
As shown in fig. 1, fig. 1 illustrates a data set partitioning method 100 (hereinafter referred to as method 100), and the method 100 includes the following steps:
step S101, a data set is obtained, wherein the data set comprises a plurality of text units, and the entities in the text units are marked.
In this embodiment, the data set is a set of data for training and testing the entity recognition model, and in order to make the data distribution ratio of the training set and the test set similar, the data set needs to be classified first, and the entities in the data set are marked to classify the entities. In addition, the entity included in the data set in the application is a word ambiguous entity, and only when the entity has multiple meanings, the entity needs to be divided according to types and positions, and other entities only need to be divided by using a traditional data set.
Step S102, classifying the data set according to entity types, and dividing the data set into a plurality of entity type groups.
In this embodiment, the data sets are first grouped according to each entity's traditional type partitioning.
Step S103, recording the starting position and the ending position of each entity in the text unit, and obtaining the position information of each entity.
In this embodiment, since the different positions in the text where the entities appear may represent different meanings, it is necessary to record where each entity is located. Wherein the different locations where entities appear may be different paragraphs of the article, such as leading edge, middle, and trailing; or may be different sentences in a paragraph; but may also be the beginning, middle and end of the same sentence.
Step S104, dividing the data set into a plurality of entity position groups according to the position information.
In this embodiment, each text unit may include at least one entity, and the location of each entity may be the same or different, and the text units are divided into entity location groups according to the location information of all the entities included in each text unit.
Step S105, counting the proportion of each entity position group in the data set.
Step S106, dividing the entity type groups into a plurality of subgroups according to the entity position groups, and finishing grouping.
In this embodiment, for example, for the "immunotherapeutic" entity type group, we can divide the data sets into different subgroups according to the entity location group, which will provide us with one data set classified by entity type, entity location. Thus, we divide the data set into multiple tiers, each tier containing a subset of entities of a particular entity type and entity level location information combination. For example, one hierarchy may contain "immunotherapeutic" entity types and samples of the entity occurrences at the beginning of the text. In this way, we can apply a hierarchical sampling strategy for each hierarchy, ensuring a more balanced distribution of training and testing sets in terms of entity type, context and location information.
In an alternative embodiment, the text unit includes: chapters, paragraphs, and sentences.
In this embodiment, the probability that an entity appears at different positions can be determined by different text units, for example, different entities with the same name, and the probability that an entity appears at different positions in an article is often different, for example, the same entity may represent a disease type and may also represent a symptom type, and the disease often appears necessarily at a leading edge and a conclusion part, and symptoms appear more in a text part of the article. Similarly, other synonymous entities appear with different probabilities at different locations of the article. Therefore, the text unit of the present application includes a chapter, a paragraph and a sentence, and further covers the influence of the front and rear paragraphs of the chapter, the influence of the front and rear sentences in the same paragraph, and the influence of the front and rear positions in the same sentence, so as to obtain more comprehensive physical position information.
In an alternative embodiment, the location information of each entity includes a start location, an end location, and a length of an entity name of the entity.
In this embodiment, when the text unit is a chapter, the location information of the entity further includes a paragraph in the article where the entity is located; when the text unit is a paragraph, the position information of the entity also comprises the sentence of which paragraph the entity is in; when the text unit is a sentence, the location information of the entity further includes the byte of the sentence in which the entity is located.
In an alternative embodiment, the dividing the data set into a plurality of entity location groups according to the location information includes:
grouping each entity according to the position information of the entity;
the set of entity locations for each text unit is determined based on the grouping of all entities that each text unit includes.
In this embodiment, the location of each entity in the text unit is first determined according to the location information of the entity, and further, each text unit may include more than one entity, where the location groups of more than one entity may be the same or different, so, in order to more accurately perform location division on the text unit, it is necessary to divide the location groups on the text unit by combining the location information of all entities included in a single text unit.
In an alternative embodiment, the grouping the entities includes:
dividing the corresponding entity into a head group if the position information indicates that the entity appears in the part before one third of the text unit;
dividing the corresponding entity into middle groups if the location information indicates that the entity appears in one third to two thirds of the text units;
if the location information indicates that the corresponding entity appears in the two-thirds later portion of the text unit, the entity is divided into ending groups.
In an alternative embodiment, the set of entity locations includes: a beginning group, an intermediate group, an ending group, a beginning intermediate group, a beginning ending group, an intermediate ending group, a beginning intermediate ending group.
In an alternative embodiment, the method further comprises:
and dividing a training set and a testing set according to a preset proportion and the proportion of each entity position group aiming at each entity type group, so that the proportion of text units of each entity position group is the same in the training set and the testing set.
In this embodiment, the proportions of the respective entity location groups are first counted, and when dividing the training set and the test set, we need to ensure that these proportions are maintained in the training set and the test set so that the model can learn the characteristics of the respective entity groups. For example, the ratio of the seven entity location groups in the data set is 1:1:2:3:1:1:2, respectively, then when the data of the training set and the test set are divided, the ratio of the seven entity location groups of the same type of entity in the two sets should also be 1:1:2:3:1:1:2.
In summary, the data partitioning method of the embodiment of the application is applied to the medical field, and can count the probability that entity types and entity values thereof in the data set appear at different positions in sentences, group the data set according to the entity types, and then further conduct subgroup partitioning in the group according to the position information of the entity in a specific text. And sampling the packet according to the probability of the statistics, so as to ensure that the distribution of the training set and the testing set in the aspects of entity type, context and position information is more balanced.
Corresponding method 100 the embodiment of the application also provides a device for executing the method.
As shown in fig. 2, fig. 2 illustrates a data partitioning apparatus 200, the apparatus comprising:
a receiving module 201, configured to obtain a data set, where the data set includes a plurality of text units, and label entities in the text units;
a processing module 202, configured to classify the data set into a plurality of entity type groups according to entity types; recording the starting position and the ending position of each entity in the text unit to obtain the position information of each entity; dividing the dataset into a plurality of entity location groups according to the location information; counting the proportion of each entity position group in the data set; for each entity type group, dividing the plurality of entity location groups into a plurality of subgroups in combination to complete grouping.
It will be understood that the above division of each module/unit is merely a division of a logic function, and when actually implemented, the functions of each module may be integrated into a hardware entity, for example, the functions of the processing module may be integrated into a processor implementation, the functions of the receiving module may be integrated into a transceiver implementation, and a program and an instruction for implementing the functions of each module may be maintained in a memory. For example, fig. 3 provides an electronic device 300, the electronic device 300 comprising a processor 301, a transceiver 302, and a memory 303. Wherein the transceiver 302 is configured to perform the transceiving of data and signals in the method 100. The memory 303 may be used to store programs/code and the like required by the processor 301 to perform the method 100.
In a specific implementation, corresponding to the foregoing electronic device 300, the embodiments of the present application further provide a computer storage medium, where the computer storage medium provided in the electronic device 300 may store a program, and when the program is executed, may implement some or all of the steps including the embodiments of the method 100. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.
Those skilled in the art will appreciate that, for convenience and brevity, the specific working procedures of the above-described systems, apparatuses and units may refer to the corresponding procedures in the foregoing method embodiments, which are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed methods, apparatuses, and systems may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a control device for a cloud game, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
While alternative embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not meant to limit the scope of the invention, but to limit the scope of the invention.

Claims (9)

1. A method of partitioning a data set, the method comprising:
acquiring a data set, wherein the data set comprises a plurality of text units, and labeling entities in the text units;
classifying the data set according to entity types, and dividing the data set into a plurality of entity type groups;
recording the starting position and the ending position of each entity in the text unit to obtain the position information of each entity;
dividing the dataset into a plurality of entity location groups according to the location information;
counting the proportion of each entity position group in the data set;
for each entity type group, dividing into a plurality of subgroups in connection with the plurality of entity location groups, comprising: and dividing a training set and a testing set according to a preset proportion and the proportion of each entity position group aiming at each entity type group so that the proportion of text units of each entity position group in the training set and the testing set is the same, and finishing grouping.
2. The method of claim 1, wherein the text unit comprises: chapters, paragraphs, and sentences.
3. The method of claim 1, wherein the location information for each entity comprises a starting location, an ending location, and a length of an entity name for the entity.
4. The method of claim 1, wherein said dividing said data set into a plurality of entity location groups according to said location information comprises:
grouping each entity according to the position information of the entity;
the set of entity locations for each text unit is determined based on the grouping of all entities that each text unit includes.
5. The method of claim 4, wherein said grouping the entities comprises:
dividing the corresponding entity into a head group if the position information indicates that the entity appears in the part before one third of the text unit;
dividing the corresponding entity into middle groups if the location information indicates that the entity appears in one third to two thirds of the text units;
if the location information indicates that the corresponding entity appears in the two-thirds later portion of the text unit, the entity is divided into ending groups.
6. The method of claim 5, wherein the plurality of sets of entity locations comprises: a beginning group, an intermediate group, an ending group, a beginning intermediate group, a beginning ending group, an intermediate ending group, a beginning intermediate ending group.
7. A data set partitioning apparatus, the apparatus comprising:
the receiving module is used for acquiring a data set, wherein the data set comprises a plurality of text units and labeling entities in the text units;
the processing module is used for classifying the data set according to entity types and dividing the data set into a plurality of entity type groups; recording the starting position and the ending position of each entity in the text unit to obtain the position information of each entity; dividing the dataset into a plurality of entity location groups according to the location information; counting the proportion of each entity position group in the data set; for each entity type group, dividing the plurality of entity location groups into a plurality of subgroups in combination to complete grouping.
8. An electronic device comprising a memory and one or more processors; wherein the memory is for storing computer program code, the computer program code comprising computer instructions; the computer instructions, when executed by the processor, cause the electronic device to perform the method of any one of claims 1 to 6.
9. A computer readable storage medium comprising a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 6.
CN202310491927.9A 2023-05-05 2023-05-05 Data set dividing method and device and electronic equipment Active CN116205235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310491927.9A CN116205235B (en) 2023-05-05 2023-05-05 Data set dividing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310491927.9A CN116205235B (en) 2023-05-05 2023-05-05 Data set dividing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN116205235A CN116205235A (en) 2023-06-02
CN116205235B true CN116205235B (en) 2023-08-01

Family

ID=86511502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310491927.9A Active CN116205235B (en) 2023-05-05 2023-05-05 Data set dividing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116205235B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117874173A (en) * 2024-03-11 2024-04-12 腾讯科技(深圳)有限公司 Training method and related device of vector model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201717432D0 (en) * 2016-10-28 2017-12-06 Kira Inc System and method for extracting entities in electronic documents
CN109472033A (en) * 2018-11-19 2019-03-15 华南师范大学 Entity relation extraction method and system in text, storage medium, electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079377B (en) * 2019-12-03 2022-12-13 哈尔滨工程大学 Method for recognizing named entities of Chinese medical texts
US20210390256A1 (en) * 2020-06-15 2021-12-16 International Business Machines Corporation Methods and systems for multiple entity type entity recognition
CN112182204A (en) * 2020-08-19 2021-01-05 广东汇银贸易有限公司 Method and device for constructing corpus labeled by Chinese named entities
CN111986765B (en) * 2020-09-03 2023-11-21 深圳平安智慧医健科技有限公司 Electronic case entity marking method, electronic case entity marking device, electronic case entity marking computer equipment and storage medium
CN113177411A (en) * 2021-03-31 2021-07-27 杭州费尔斯通科技有限公司 Training method of named entity recognition model and named entity recognition method
US20230019081A1 (en) * 2021-07-16 2023-01-19 Microsoft Technology Licensing, Llc Modular self-supervision for document-level relation extraction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201717432D0 (en) * 2016-10-28 2017-12-06 Kira Inc System and method for extracting entities in electronic documents
CN109472033A (en) * 2018-11-19 2019-03-15 华南师范大学 Entity relation extraction method and system in text, storage medium, electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
非平衡文本情感分类的数据集设计与评价指标;赵立东;李德玉;王素格;;电脑开发与应用(第05期);全文 *

Also Published As

Publication number Publication date
CN116205235A (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN116205235B (en) Data set dividing method and device and electronic equipment
US10878335B1 (en) Scalable text analysis using probabilistic data structures
US20170068654A1 (en) Method and system for extracting sentences
CN110378346B (en) Method, device and equipment for establishing character recognition model and computer storage medium
CN103970826B (en) Retrieve device and search method
CN106610931B (en) Topic name extraction method and device
CN109801693B (en) Medical records grouping method and device, terminal and computer readable storage medium
CA2853627A1 (en) Automatic creation of clinical study reports
CN112329460A (en) Text topic clustering method, device, equipment and storage medium
CN109101603A (en) A kind of data comparison method, device, equipment and storage medium
CN109101435A (en) The multi partition recognition methods of movable storage device and system, car-mounted terminal
CN115840654B (en) Message processing method, system, computing device and readable storage medium
CN116860583A (en) Database performance optimization method and device, storage medium and electronic equipment
CN111061927B (en) Data processing method and device and electronic equipment
CN110990207A (en) BPS memory test method, system, terminal and storage medium based on Whitley platform
CN114281977A (en) Similar document searching method and device based on massive documents
CN104077282A (en) Method and device for processing data
CN109947933B (en) Method and device for classifying logs
CN109739817B (en) Method and system for storing data file in big data storage system
US20200097306A1 (en) Lightweight and precise value profiling
CN110764777A (en) ELF file generation method, ELF file, equipment and storage medium
CN116860761B (en) Data acquisition method, electronic equipment and storage medium
CN112152873B (en) User identification method and device, computer equipment and storage medium
WO2015004571A1 (en) Method and system for implementing a bit array in a cache line
CN109446226B (en) Method and equipment for determining data set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant