CN115269840A - Map label obtaining method and device, electronic equipment and storage medium - Google Patents
Map label obtaining method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN115269840A CN115269840A CN202210860731.8A CN202210860731A CN115269840A CN 115269840 A CN115269840 A CN 115269840A CN 202210860731 A CN202210860731 A CN 202210860731A CN 115269840 A CN115269840 A CN 115269840A
- Authority
- CN
- China
- Prior art keywords
- data
- text data
- label
- map
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The application provides a map icon label obtaining method, a map icon label obtaining device, electronic equipment and a storage medium, wherein the method comprises the following steps: training to obtain an initial model for extracting labels based on a training set in which the map text data are labeled with the labels in advance; acquiring a data set of labels to be marked of the map text data, and extracting the labels of the map text data in the data set by using the initial model; acquiring a manual verification result of a label of the map text data in the data set, and screening out the verified and labeled incremental labeling data from the data set based on the manual verification result; adjusting and training the initial model based on the incremental marking data to obtain a target model with model indexes meeting business requirements; and acquiring labels of the user production content in the map based on the target model. According to the map label extraction method and device, the efficiency of map label extraction can be improved.
Description
Technical Field
The application relates to the field of games, in particular to a method and a device for acquiring a map icon label, electronic equipment and a storage medium.
Background
In the field of games, there is sometimes a need for label extraction of map text data therein. In the prior art, a supervised mode is usually adopted to train and obtain a model, and then label extraction is performed according to the trained model. Because the supervised mode needs to prepare the labeled training corpus in advance, a large amount of labor cost and time cost are consumed, and the efficiency of extracting the map labels is low.
Disclosure of Invention
An object of the present application is to provide a map label acquiring method, apparatus, electronic device and storage medium, which can improve the efficiency of map label extraction.
According to an aspect of the embodiments of the present application, a method for obtaining a map icon label is disclosed, the method comprising:
training to obtain an initial model for extracting labels based on a training set in which the map text data are labeled with the labels in advance;
acquiring a data set of labels to be marked of the map text data, and extracting the labels of the map text data in the data set by using the initial model;
acquiring a manual verification result of a label of the map text data in the data set, and screening out the verified and labeled incremental labeling data from the data set based on the manual verification result;
adjusting and training the initial model based on the incremental marking data to obtain a target model with model indexes meeting business requirements;
and acquiring labels of the user production content in the map based on the target model.
According to an aspect of the embodiments of the present application, a map tag obtaining apparatus is disclosed, the apparatus including:
the first training module is configured to train to obtain an initial model for extracting a label based on a training set in which the map text data is labeled with the label in advance;
the first label module is configured to obtain a data set of labels to be marked of the map text data, and the initial model is used for extracting the labels of the map text data in the data set;
the verification module is configured to acquire a manual verification result of a label of the map text data in the data set, and based on the manual verification result, screening the increment marking data which passes the verification and is marked with the label from the data set;
the second training module is configured to adjust and train the initial model based on the incremental marking data to obtain a target model with model indexes meeting business requirements;
and the second label module is configured to obtain labels of user production contents in the map based on the target model.
According to an aspect of an embodiment of the present application, an electronic device is disclosed, including: one or more processors; storage means for storing one or more programs that, when executed by the one or more processors, cause the electronic device to implement any of the above embodiments.
According to an aspect of embodiments herein, a computer program medium is disclosed, having computer readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform any of the above embodiments.
According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.
In the embodiment of the application, the incremental marking data are screened out from the map text data in the data set marked by the initial model based on the manual verification result, and then the training initial model is adjusted based on the incremental marking data, so that the demand for the map text data marked with labels in advance in the process of training the target model meeting the business requirement is reduced, the labor cost is reduced, the training efficiency of the target model is improved, and the efficiency of extracting the map labels on the basis of the target model is improved.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
Fig. 1 shows a flowchart of a map tag acquisition method according to an embodiment of the present application.
FIG. 2 shows a flow diagram of map tag retrieval and application according to one embodiment of the present application.
FIG. 3 shows a data processing diagram of a model in extracting tags according to one embodiment of the present application.
FIG. 4 shows a data processing diagram of a model in extracting tags according to one embodiment of the present application.
FIG. 5 illustrates a schematic diagram of training target data with masked text data according to one embodiment of the application.
Fig. 6 shows a block diagram of a map tag obtaining apparatus according to an embodiment of the present application.
FIG. 7 illustrates an electronic device hardware diagram according to one embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the present application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, steps, and so forth. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The application provides a map label acquisition method, which can be applied to the field of games and can be used for extracting labels of User Generated Content UGC (User Generated Content) in a map in a game. The extracted labels can be used for auditing the user production content in the map; and when the user shares the homemade map in the game community, the extracted label is provided for the user to refer to, so that the user can reasonably self-define the label of the homemade map.
Fig. 1 shows a flowchart of a map tag obtaining method provided in an embodiment of the present application, where an exemplary execution subject of the method is a server, and the method includes:
step S110, training to obtain an initial model for extracting labels based on a training set in which the map text data are marked with the labels in advance;
step S120, acquiring a data set of the labels to be marked of the map text data, and extracting the labels of the map text data in the data set by using an initial model;
step S130, acquiring a manual verification result of the label of the map text data in the data set, and screening the increment marking data which passes the verification and is marked with the label from the data set based on the manual verification result;
step S140, adjusting a training initial model based on incremental marking data to obtain a target model with model indexes meeting business requirements;
and S150, acquiring labels of user production contents in the map based on the target model.
In the embodiment of the application, a training set is obtained, and the training set is composed of map text data labeled with labels in advance. And training in a supervision mode to obtain an initial model for extracting the labels based on the training set.
Considering that the pre-labeled map text data usually requires a large labor cost and time cost, the initial model performance obtained by training is limited if the training cost and efficiency are controlled, and further optimization is needed.
In order to further optimize the initial model, a data set is obtained, which is composed of map text data that are not labeled with tags. And extracting the label of the map text data in the data set by using the initial model. And providing the labels of the map text data in the data set for a manual checking system for checking to obtain a manual checking result of the labels of the map text data in the data set.
The manual verification result describes whether the label of the map text data in the data set passes the verification, the corresponding label is correct through the verification, and the corresponding label is not ideal through failure of the verification. And screening the verified and labeled incremental labeling data from the data set based on the manual verification result.
And adjusting and training the initial model based on the obtained incremental marking data, and optimizing the initial model to obtain a target model with model indexes meeting business requirements. The model index can describe the verification passing rate of the label marked by the map text data in the data set by adopting a model.
And after the target model is obtained, putting the target model into use, and extracting the labels of the user production content in the map.
Therefore, in the embodiment of the application, the increment marking data are screened out from the map text data in the data set marked by the initial model based on the manual verification result, and then the training initial model is adjusted based on the increment marking data, so that the demand for the map text data marked with labels in advance in the process of training the target model meeting the business requirement is reduced, the labor cost is reduced, the training efficiency of the target model is improved, and the efficiency of extracting the map labels based on the target model is improved.
Fig. 2 shows a flowchart of map tag acquisition and application according to an embodiment of the present application.
Referring to fig. 2, in this embodiment, the map name and the description text are extracted as corresponding map text data, and then the map text data is preprocessed to obtain a training set labeled with a label, a data set to be labeled, and user production content.
And training by using the training set labeled with the label to obtain an initial model, and determining whether the initial model meets the business requirements or not according to the performance of the initial model. And if the service requirement is not met, extracting the label of the map text data in the data set of the label to be labeled by using the initial model, and screening the incremental labeling data by combining manual verification. And updating the training set with the labels according to the incremental labeling data, adjusting and training the initial model until the performance of the initial model meets the business requirements, and terminating the adjustment and training of the initial model to obtain the target model.
And after the target model is obtained, extracting the label of the user production content in the map by using the target model to obtain a candidate label template. And manually filtering the candidate label template to obtain a final label template, and putting the final label template into application for a user to match and configure the map label according to the template.
In one embodiment, the target map text data is segmented into single words, and first position information of the single words in a sentence is obtained. Vectorizing the single character to obtain a character vector, and splicing the character vector, the first position information and the corresponding sentence mark to obtain sequence data. And inputting the sequence data into the initial model or the target model to obtain the sequence tag probability output by the model into which the sequence data is input. And acquiring the label of the text data of the target map based on the sequence label probability.
The embodiment mainly describes the data processing logic of the model in the process of extracting the label. The data processing logic described in this embodiment is applicable to both the label extraction process in the initial model training stage and the label extraction process in the target model use stage.
Specifically, when the initial model training stage is in, the target map text data is map text data in a training set, or incremental marking data; when the target model is in the using stage, the target map text data is used for producing content for the user in the map.
And after the target map text data of the current stage is obtained, segmenting the target map text data into single words. And further acquiring first position information of each single character in the sentence, and vectorizing each single character to obtain a character vector corresponding to each single character.
And then splicing the word vector of each single word, the first position information and the corresponding sentence mark to obtain sequence data corresponding to the text data of the target map. The sentence mark is mainly used for uniquely marking the sentence where the Chinese character is located.
After the sequence data is obtained, if the current stage is an initial model training stage, inputting the sequence data into an initial model to obtain a sequence label probability output by the initial model, and further obtaining labels of map text data in a training set extracted by the initial model or labels of incremental marking data extracted by the initial model based on the sequence label probability; and if the current stage is the stage of putting the target model into use, inputting the sequence data into the target model to obtain the probability of the sequence tags output by the target model, and further acquiring the tags of the user production content in the map extracted by the target model based on the probability of the sequence tags.
In one embodiment, obtaining the tags of the text data of the target map based on the sequence tag probability includes:
and converting the sequence label probability by adopting a conditional random field CRF to obtain the label of the text data of the target map.
In this embodiment, the Conditional Random field CRF (Conditional Random Fields) is a Conditional probability distribution model, which outputs the input sequence tag probability as the final tag according to the transition probability between tags.
Fig. 3 shows a data processing diagram of a model in the process of extracting a tag according to an embodiment of the present application.
Referring to fig. 3, in the present embodiment, a map name and a description text are used as target map text data of a tag to be extracted. After the single characters are segmented to obtain the single characters, the single characters are processed into corresponding character vectors token, and the position information pos of the single characters in the sentence and the corresponding sentence marks sen are combined to obtain the input vectors E corresponding to the single characters. And (5) obtaining sequence data after the input vectors E are arranged and combined. And then inputting the sequence data into a BERT model, outputting a corresponding context vector T through the BERT model, and further obtaining the probability of the sequence label through a softmax function. And then processing the sequence label probability by using CRF, and outputting a final BIEO (Begin-Inside-End-out) label.
In an embodiment, the splicing the word vector, the first location information, and the corresponding sentence identifier to obtain sequence data includes:
dividing the target map text data into complex words and acquiring second position information of the complex words in the sentence;
and vectorizing the complex word vocabulary to obtain a word vector, and splicing the word vector, the first position information, the word vector, the second position information and the corresponding sentence mark to obtain sequence data.
In this embodiment, the vocabulary information and the position information of the added vocabulary information are added to the sequence data of the model to be input, so as to enhance the judgment capability of the word boundary corresponding to the model.
Specifically, the target map text data is divided into complex words, that is, words composed of at least two words, in addition to dividing the target map text data into individual words and obtaining the first position information and word vectors of the individual words. And then acquiring second position information of each compound word in the sentence, and carrying out vectorization processing on each compound word to obtain a word vector corresponding to each compound word.
And further splicing the word vector, the first position information and the corresponding sentence identification of each single word in sequence, and simultaneously splicing the word vector, the second position information and the corresponding sentence identification of each compound word to obtain sequence data.
In one embodiment, obtaining second position information of the complex word in the sentence comprises:
determining the first character of the complex word and the tail character of the complex word;
and combining the position marks of the first character in the sentence and the tail in the sentence to obtain second position information.
In this embodiment, the second position information of the complex word is mainly used to describe the head and tail positions of the complex word.
Specifically, the first character and the last character of the compound word are determined, the position mark of the first character in the sentence and the position mark of the last character in the sentence are determined, and then the position marks of the first character and the last character are combined to obtain second position information of the compound word. For example: in the sentence "kitten likes to eat fish", the position of "kitten" is denoted by 1, the position of "cat" is denoted by 2, the position of "favorite" is denoted by 3, the position of "happy" is denoted by 4, the position of "eating" is denoted by 5, and the position of "fish" is denoted by 6. The second position information of "kitten" can be represented as (1,2) and the second position information of "like" can be represented as (3,4).
Fig. 4 shows a data processing diagram of the model in the process of extracting the tag according to an embodiment of the present application.
Referring to fig. 4, in the embodiment, the target map text data of the tag to be extracted is "Chongqing and chafing dish". The sequence data input to the BERT model is added with second position information of complex words in addition to first position information of each individual word.
Specifically, the first position information of "heavy" is represented as (1,1), "the first position information of" celebration "is represented as (2,2)," the first position information of "person" is represented as (3,3), "the first position information of" and "is represented as (4,4)," the first position information of "fire" is represented as (5,5), "the first position information of" pot "is represented as (6,6).
The first word of "Chongqing" is "Chong", and the last word is "Qing", so its second position information is represented as (1,2). The first word of "person and chafing dish" is "person" and the end word is "pot", so its second position information is represented as (3,6). The first word of "chafing dish" is "fire", and the last word is "pan", so its second position information is represented as (5,6).
In one embodiment, the target map text data includes first map text data in a first language and second map text data in a second language, and the first map text data and the second map text data have the same semantic meaning.
Inputting the sequence data into an initial model or a target model to obtain the sequence tag probability output by the model into which the sequence data is input, wherein the sequence tag probability comprises the following steps: and inputting the sequence data into the initial model or the target model to obtain a first sequence tag probability output by the model input with the sequence data aiming at the first map text data.
Obtaining the label of the target map text data based on the sequence label probability, comprising the following steps: and acquiring the label of the first map text data based on the first sequence label probability.
In this embodiment, the parallel corpus is used for data concatenation to train the model, so as to improve the multi-language tag extraction effect of the model.
Specifically, the target map text data for training the model includes corpora in two languages: first map text data in a first language and second map text data in a second language.
And simultaneously segmenting, vectorizing and splicing the first map text data and the second map text data with the same semantics. And after the spliced sequence data is obtained, inputting the spliced sequence data into a model, training the model to output a corresponding first sequence label probability aiming at the first map text data, and further extracting the label of the first map text data. The second map text data can be obtained by translating the first map text data into the second language.
When extracting the label of the first map text data of the first language, the model considers the second map text data of the second language with the same semantics, so the model can well integrate the label extraction capability of the first language and the second language.
Furthermore, by adopting the method provided by the embodiment, the deviation caused by translation in the process of expanding the second language training corpus according to the first language training corpus can be reduced. In detail, when the first language is the Chinese language and the second language is the small language, the Chinese training corpus of the Chinese language is richer. In order to make the model accurately suitable for the small languages, the training corpus of the small languages needs to be expanded. If the corpus obtained by translating the chinese corpus is directly used to expand the corpus of the portulaca, the effect of the model on the portulaca is not ideal due to the deviation existing in the translation process. Therefore, in order to reduce the deviation caused by translation, the Chinese corpus may be translated into the corresponding small-language corpus, the two are spliced, and the training model extracts the label of the Chinese corpus. And then, expanding the small-language training corpus in a mode of translating the Chinese training corpus, further translating the expanded small-language training corpus into the corresponding Chinese training corpus, splicing the two, and extracting the label of the expanded small-language training corpus by the training model. Through the splicing mode, the deviation caused by translation during corpus expansion is reduced, and the model keeps a good label extraction effect after being migrated to different languages.
FIG. 5 is a schematic diagram illustrating data processing of the model in the process of extracting multilingual corpus concatenation according to an embodiment of the present application.
Referring to fig. 5, in this embodiment, the corpus of the training model is composed of chinese training corpus and english training corpus. Specifically, the Chinese training corpus is the "hello world". ", english training corpus is" Hello World ". After the segmentation, vectorization and splicing are carried out on the two, the two are input into a BERT model, and the model is trained to extract the labels of the Chinese training corpus. Because the model considers the English training corpus with the same semantics when extracting the labels of the Chinese training corpus, the model can well integrate the label extraction capability of Chinese and English. Furthermore, the deviation caused by translation in the process of expanding the English training corpus according to the Chinese training corpus can be reduced, so that the model mainly used for extracting the Chinese corpus labels is favorably migrated to an English application scene, and the effect of extracting the English corpus labels by the migrated model is ensured.
In one embodiment, training to obtain an initial model for extracting tags based on a training set in which the map text data is pre-labeled with tags includes:
expanding the training set by replacing the vocabulary in the training set with synonyms;
or, the training set is expanded in a mode of replacing the entity vocabularies with the same type of entity vocabularies;
or, the training set is expanded in a mode of carrying out disorder processing on sentence fragments.
In this embodiment, before the initial model is trained, the training set is expanded in a data enhancement manner.
Specifically, the data enhancement mode includes three types: synonym replacement, entity replacement, fragment shuffling.
When synonym replacement is carried out, a synonym table can be established in advance, then the synonym table is utilized to carry out random replacement on the vocabulary of the map text data in the training set according to the binomial distribution, new map text data marked with labels are obtained, and the new map text data are added into the training set. When sequence labeling is carried out by adopting a BIEO mode, if the replaced synonym is more than 1 word vector token, the BIEO labels are sequentially extended.
When the entity replacement is carried out, similarly to the synonym replacement, the same type of entity vocabulary can be established in advance, then the entity vocabulary of the map text data in the training set is randomly replaced according to the binomial distribution by utilizing the same type of entity vocabulary, new map text data marked with labels are obtained, and the new map text data are added into the training set. When sequence labeling is carried out by adopting a BIEO mode, if the replaced entity vocabulary is more than 1 word vector token, the BIEO labels are sequentially extended.
When the fragments are out of order, the sentences can be segmented according to punctuation marks to obtain each sentence fragment. And further randomly disordering the sequence of each sentence fragment to obtain disordered new-place text-text data, and adding the new-place text-text data into a training set.
Fig. 6 shows a block diagram of a map tag obtaining apparatus according to an embodiment of the present application, the apparatus including:
a first training module 210 configured to train to obtain an initial model for extracting a label based on a training set in which a label is pre-labeled to map text data;
the first label module 220 is configured to obtain a data set of labels to be labeled of the map text data, and extract the labels of the map text data in the data set by using the initial model;
the verification module 230 is configured to obtain a manual verification result of the label of the map text data in the dataset, and based on the manual verification result, screen out the increment marking data which passes the verification and is marked with the label from the dataset;
a second training module 240 configured to adjust and train the initial model based on the incremental annotation data to obtain a target model with model indexes meeting business requirements;
a second tagging module 250 configured to obtain tags of user production content in the map based on the object model.
In an exemplary embodiment of the present application, the apparatus is configured to:
dividing target map text data into single characters, and acquiring first position information of the single characters in a sentence;
vectorizing the single character to obtain a character vector, and splicing the character vector, the first position information and the corresponding sentence mark to obtain sequence data;
inputting the sequence data into the initial model or the target model to obtain the sequence label probability output by the model into which the sequence data are input;
and acquiring the label of the text data of the target map based on the sequence label probability.
In an exemplary embodiment of the present application, the apparatus is configured to:
the target map text data are divided into complex words, and second position information of the complex words in sentences is obtained;
vectorizing the complex word vocabulary to obtain a word vector, and splicing the word vector, the first position information, the word vector, the second position information and the corresponding sentence mark to obtain the sequence data.
In an exemplary embodiment of the present application, the apparatus is configured to:
determining the first character of the complex word and the tail character of the complex word;
and combining the position identifier of the first character in the sentence with the position identifier of the tail in the sentence to obtain the second position information.
In an exemplary embodiment of the present application, the apparatus is configured to:
and converting the sequence label probability by adopting a conditional random field CRF to obtain the label of the target map text data.
In an exemplary embodiment of the present application, the target map text data includes first map text data in a first language and second map text data in a second language, and the first map text data and the second map text data have the same semantic meaning; the apparatus is configured to:
inputting the sequence data into the initial model or the target model to obtain a first sequence label probability output by the model input with the sequence data aiming at the first map text data;
and acquiring the label of the first map text data based on the first sequence label probability.
In an exemplary embodiment of the present application, the apparatus is configured to:
expanding the training set in a mode of replacing words in the training set with synonyms;
or, the training set is expanded in a mode of replacing entity vocabularies with the same type of entity vocabularies;
or expanding the training set in a mode of carrying out disorder processing on sentence fragments.
An electronic device 30 according to an embodiment of the present application is described below with reference to fig. 7. The electronic device 30 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present application.
As shown in fig. 7, the electronic device 30 is in the form of a general purpose computing device. The components of the electronic device 30 may include, but are not limited to: the at least one processing unit 310, the at least one memory unit 320, and a bus 330 that couples various system components including the memory unit 320 and the processing unit 310.
Wherein the storage unit stores program code executable by the processing unit 310 to cause the processing unit 310 to perform steps according to various exemplary embodiments of the present invention described in the description part of the above exemplary methods of the present specification. For example, the processing unit 310 may perform various steps as shown in fig. 1.
The storage unit 320 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM) 3201 and/or a cache memory unit 3202, and may further include a read only memory unit (ROM) 3203.
The storage unit 320 may also include a program/utility 3204 having a set (at least one) of program modules 3205, such program modules 3205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 30 may also communicate with one or more external devices 400 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 30, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 30 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 350. An input/output (I/O) interface 350 is connected to the display unit 340. Also, the electronic device 30 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 360. As shown, the network adapter 360 communicates with the other modules of the electronic device 30 via the bus 330. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 30, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present application.
In an exemplary embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method described in the above method embodiment section.
According to an embodiment of the present application, there is also provided a program product for implementing the method in the above method embodiment, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this respect, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as JAVA, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods herein are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
Claims (10)
1. A map icon label acquisition method is characterized by comprising the following steps:
training to obtain an initial model for extracting labels based on a training set in which the map text data are labeled with the labels in advance;
acquiring a data set of labels to be marked of the map text data, and extracting the labels of the map text data in the data set by using the initial model;
acquiring a manual verification result of a label of the map text data in the data set, and screening out the verified and labeled incremental labeling data from the data set based on the manual verification result;
adjusting and training the initial model based on the incremental marking data to obtain a target model with model indexes meeting business requirements;
and acquiring labels of the user production content in the map based on the target model.
2. The method of claim 1, further comprising:
segmenting target map text data into single characters, and acquiring first position information of the single characters in a sentence;
vectorizing the single character to obtain a character vector, and splicing the character vector, the first position information and the corresponding sentence mark to obtain sequence data;
inputting the sequence data into the initial model or the target model to obtain the sequence label probability output by the model into which the sequence data are input;
and acquiring the label of the target map text data based on the sequence label probability.
3. The method of claim 2, wherein the concatenating the word vector, the first location information, and the corresponding sentence identifier to obtain sequence data comprises:
the target map text data are divided into complex words, and second position information of the complex words in sentences is obtained;
vectorizing the complex word vocabulary to obtain a word vector, and splicing the word vector, the first position information, the word vector, the second position information and the corresponding sentence mark to obtain the sequence data.
4. The method of claim 3, wherein obtaining second position information of the complex vocabulary in the sentence comprises:
determining the first character of the complex word and the tail character of the complex word;
and combining the position marks of the first character in the sentence and the tail in the sentence to obtain the second position information.
5. The method of claim 2, wherein obtaining the tag of the target map text data based on the sequence tag probability comprises:
and converting the sequence label probability by adopting a conditional random field CRF to obtain the label of the target map text data.
6. The method of claim 2, wherein the target map text data includes first map text data in a first language and second map text data in a second language, the first map text data being semantically identical to the second map text data;
inputting the sequence data into the initial model or the target model to obtain the sequence tag probability output by the model into which the sequence data is input, wherein the sequence tag probability comprises the following steps: inputting the sequence data into the initial model or the target model to obtain a first sequence tag probability output by the model inputted with the sequence data aiming at the first map text data;
obtaining the label of the target map text data based on the sequence label probability, wherein the obtaining comprises the following steps: and acquiring the label of the first map text data based on the first sequence label probability.
7. The method of claim 1, wherein training based on a training set in which map text data is pre-labeled with labels results in an initial model for extracting labels, comprising:
expanding the training set in a mode of replacing words in the training set with synonyms;
or, the training set is expanded in a mode of replacing entity vocabularies with the same type of entity vocabularies;
or expanding the training set in a mode of carrying out disorder processing on sentence fragments.
8. An icon sticker acquisition apparatus, comprising:
the first training module is configured to train to obtain an initial model for extracting a label based on a training set in which the map text data is labeled with the label in advance;
the first label module is configured to obtain a data set of labels to be marked of the map text data, and the initial model is used for extracting the labels of the map text data in the data set;
the verification module is configured to acquire a manual verification result of a label of the map text data in the data set, and based on the manual verification result, screening the increment marking data which passes the verification and is marked with the label from the data set;
the second training module is configured to adjust and train the initial model based on the incremental marking data to obtain a target model with model indexes meeting business requirements;
and the second label module is configured to obtain the label of the user production content in the map based on the target model.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to carry out the method of any one of claims 1 to 7.
10. A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210860731.8A CN115269840A (en) | 2022-07-21 | 2022-07-21 | Map label obtaining method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210860731.8A CN115269840A (en) | 2022-07-21 | 2022-07-21 | Map label obtaining method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115269840A true CN115269840A (en) | 2022-11-01 |
Family
ID=83767097
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210860731.8A Pending CN115269840A (en) | 2022-07-21 | 2022-07-21 | Map label obtaining method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115269840A (en) |
-
2022
- 2022-07-21 CN CN202210860731.8A patent/CN115269840A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110717039B (en) | Text classification method and apparatus, electronic device, and computer-readable storage medium | |
CN111309915B (en) | Method, system, device and storage medium for training natural language of joint learning | |
US10664660B2 (en) | Method and device for extracting entity relation based on deep learning, and server | |
CN113807098B (en) | Model training method and device, electronic equipment and storage medium | |
CN110287480B (en) | Named entity identification method, device, storage medium and terminal equipment | |
US8903707B2 (en) | Predicting pronouns of dropped pronoun style languages for natural language translation | |
US10838996B2 (en) | Document revision change summarization | |
CN110580308B (en) | Information auditing method and device, electronic equipment and storage medium | |
US12079580B2 (en) | Information extraction method, extraction model training method, apparatus and electronic device | |
CN111310470B (en) | Chinese named entity recognition method fusing word and word features | |
CN110555205B (en) | Negative semantic recognition method and device, electronic equipment and storage medium | |
CN110795938B (en) | Text sequence word segmentation method, device and storage medium | |
WO2021174864A1 (en) | Information extraction method and apparatus based on small number of training samples | |
CN109213851B (en) | Cross-language migration method for spoken language understanding in dialog system | |
US20200265074A1 (en) | Searching multilingual documents based on document structure extraction | |
CN111160004B (en) | Method and device for establishing sentence-breaking model | |
US11086600B2 (en) | Back-end application code stub generation from a front-end application wireframe | |
CN111144102B (en) | Method and device for identifying entity in statement and electronic equipment | |
CN112989043B (en) | Reference resolution method, reference resolution device, electronic equipment and readable storage medium | |
CN116468009A (en) | Article generation method, apparatus, electronic device and storage medium | |
CN113204667A (en) | Method and device for training audio labeling model and audio labeling | |
CN111199151A (en) | Data processing method and data processing device | |
CN116796726A (en) | Resume analysis method, resume analysis device, terminal equipment and medium | |
CN111737951B (en) | Text language incidence relation labeling method and device | |
CN114356924A (en) | Method and apparatus for extracting data from structured documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |