WO2022142011A1 - Procédé et dispositif de reconnaissance d'adresse, et support de stockage - Google Patents

Procédé et dispositif de reconnaissance d'adresse, et support de stockage Download PDF

Info

Publication number
WO2022142011A1
WO2022142011A1 PCT/CN2021/090433 CN2021090433W WO2022142011A1 WO 2022142011 A1 WO2022142011 A1 WO 2022142011A1 CN 2021090433 W CN2021090433 W CN 2021090433W WO 2022142011 A1 WO2022142011 A1 WO 2022142011A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
address
question
text information
entity recognition
Prior art date
Application number
PCT/CN2021/090433
Other languages
English (en)
Chinese (zh)
Inventor
张稳
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022142011A1 publication Critical patent/WO2022142011A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the technical field of speech processing in artificial intelligence, and in particular, to an address recognition method, device, computer equipment and storage medium based on named entity recognition.
  • Human-machine dialogue is an important field in the field of artificial intelligence.
  • Dialogue is a basic communication ability and skill for human beings, and the most important thing for human beings to communicate naturally and smoothly in dialogue is to understand the intention of the other party.
  • artificial intelligence it requires the cooperation of various applications and systems to achieve a human-like effect.
  • the most critical step to support this function, and also the most basic step, is to correctly identify the intention of human speech, so that the machine can make correct response.
  • semantic recognition method that is, by constructing a training corpus and training a deep learning model according to the training corpus, so that the deep learning model can identify the question and answer text information corresponding to the training corpus, so as to know that the question and answer text information is the actual intention.
  • the purpose of the embodiments of the present application is to propose an address recognition method, device, computer equipment and storage medium based on named entity recognition, so as to solve the problem that the traditional semantic recognition method cannot be applied to semi-closed man-machine dialogue situations, and the deep learning model cannot Problems with weak generalization ability.
  • the embodiment of the present application provides an address identification method based on named entity identification, and adopts the following technical solutions:
  • the target address result is output.
  • the embodiments of the present application also provide an address identification device based on named entity identification, which adopts the following technical solutions:
  • the audio acquisition module is used to receive the question and answer audio data sent by the audio acquisition device;
  • a speech recognition module used for performing speech recognition operation on the question and answer audio data to obtain question and answer text information
  • the address text extraction module is used for performing an address text extraction operation on the question and answer text information to obtain the address text information
  • a vector conversion module for inputting the address text information into the Embedding layer to perform a vector conversion operation to obtain an address text vector
  • a feature expansion module for inputting the question and answer text information and the address text vector into the CNN model to perform a feature expansion operation to obtain an expanded text vector
  • an entity recognition module for inputting the address text vector and the expanded text vector into the trained named entity recognition model for entity recognition operation to obtain a target address result
  • a result output module used for outputting the target address result.
  • the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
  • a memory and a processor are included, and computer-readable instructions are stored in the memory.
  • the processor executes the computer-readable instructions, the processor implements the steps of the address recognition method based on named entity recognition as intended:
  • the target address result is output.
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
  • Computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by the processor, the steps of the address recognition method based on named entity recognition as described below are implemented:
  • the target address result is output.
  • the address recognition method based on named entity recognition includes: receiving question and answer audio data sent by an audio collection device; performing a voice recognition operation on the question and answer audio data to obtain question and answer text information; performing address text on the question and answer text information Extracting operation to obtain address text information; inputting the address text information into the Embedding layer for vector conversion operation to obtain an address text vector; inputting the question and answer text information and the address text vector into the CNN model for feature expansion operation, and obtaining Expanding the text vector; inputting the address text vector and the expanded text vector into the trained named entity recognition model for entity recognition operation to obtain a target address result; and outputting the target address result.
  • the audio information is converted into text information and into a question and answer text vector, and the question and answer text vector is input into the CNN model.
  • the feature information is combined to obtain the extended text vector, and finally the question and answer text vector and the extended text vector are input into the trained named entity recognition model for named entity recognition, and the target address result is obtained. Since the extended text vector combines the token's following phrase features
  • the information and the feature information of the token enable the extended text vector to solve the generalization ability of the model for entity extraction in a specific range of suffixes without requiring a large amount of data for fitting, reducing model training costs and improving model recognition. ability.
  • Fig. 1 is the realization flow chart of the address identification method based on named entity identification provided by the first embodiment of the present application;
  • Fig. 2 is a flow chart of a specific implementation of step S103 in Fig. 1;
  • step S103 in FIG. 1 is a flowchart of another specific implementation of step S103 in FIG. 1;
  • Fig. 4 is the realization flow chart of obtaining the trained named entity recognition model provided by the first embodiment of the present application.
  • Fig. 5 is a flow chart of a specific implementation manner of step S401 in Fig. 4;
  • Fig. 6 is a flow chart of a specific implementation manner of step S402 in Fig. 4;
  • FIG. 7 is a schematic structural diagram of an address identification device based on named entity identification provided in Embodiment 2 of the present application:
  • FIG. 8 is a schematic structural diagram of a specific implementation manner of the address text extraction module 130 in FIG. 7;
  • FIG. 9 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • FIG. 1 shows a flowchart for realizing the address identification method based on named entity identification provided by Embodiment 1 of the present application. For the convenience of description, only the part related to the present application is shown.
  • the above-mentioned address identification method based on named entity identification includes the following steps:
  • Step S101 Receive the question and answer audio data sent by the audio collection device.
  • the question and answer audio data refers to a waveform file that converts an audio signal during a phone call into a waveform signal.
  • the question and answer audio data can be obtained by importing audio signals collected by a microphone, a telephone or other equipment into the computer through a digital audio interface in the computer for recording.
  • Step S102 Perform a voice recognition operation on the question and answer audio data to obtain the question and answer text information.
  • the speech recognition operation is mainly used to convert the above-mentioned collected question and answer audio data into text data.
  • the speech recognition operation can be realized by a pattern matching method.
  • Each word is said in turn, and its feature vector is stored in the template library as a template.
  • the recognition stage the similarity between the feature vector of the input speech and each template in the template library is compared in turn, and the one with the highest similarity is used as the identification. result output.
  • the question and answer text information can be distinguished from the question and answer text information recognized by the voice according to the waveform characteristics of the user, and the text content information is displayed in the form of "one question and one answer", so as to obtain the question and answer of the customer service staff.
  • Text information and user's Q&A text information are displayed in the form of "one question and one answer", so as to obtain the question and answer of the customer service staff.
  • Step S103 performing an address text extraction operation on the question and answer text information to obtain address text information.
  • the address text extraction operation may be a word segmentation operation on the question and answer text information to obtain a plurality of words, and a filtering operation on the words based on the stop word table to obtain the filtered address text information.
  • the address text extraction operation may also be a word segmentation operation on the question and answer text information to obtain a plurality of words, and a filtering operation is performed on the words based on the stop word table to obtain the filtered words to be confirmed, and calculate each word to be confirmed.
  • the first word frequency of words in the question and answer text information read the local corpus, calculate the second word frequency of each word to be confirmed in the local corpus, filter the words to be confirmed according to the product of the first word frequency and the second word frequency, and obtain the address text information .
  • Step S104 Input the address text information into the Embedding layer to perform a vector conversion operation to obtain an address text vector.
  • the vector transformation operation refers to inputting the question and answer text information into the Embedding layer for vector transformation to obtain the question and answer text vector.
  • Step S105 Input the question-and-answer text information and the address text vector into the CNN model to perform a feature expansion operation to obtain an expanded text vector.
  • the CNN performs expansion processing on the obtained question and answer text vector through a sliding window, that is, adding contextual feature information to obtain an expanded text vector with contextual feature information expanded.
  • Step S106 Input the address text vector and the expanded text vector into the trained named entity recognition model to perform named entity recognition operation, and obtain the target address result.
  • the expanded text vector with the expanded contextual feature information is combined with the original question-and-answer text vector and input into the trained named entity recognition model, and the expanded text vector obtained by combining with the CNN model and the vector conversion
  • the obtained question and answer text vector increases the feature information of the context and improves the generalization ability of the trained named entity recognition model for entity extraction in a specific range of suffixes, especially long-tail address entities (such as: *** Mongolia Autonomous County ), because through the CNN sliding window, more context information of a long tail suffix such as "Mongolia Autonomous County" can be given to the downstream network layer for model parameter learning and improve the generalization ability of the model.
  • the *** area answered by the customer is extracted through the NER model, and then indexed through the national address database, and the address is retrieved through fuzzy matching of words and sounds, and it is judged whether the address said by the customer actually exists or not. It belongs to the administrative level. If the administrative level of the address mentioned by the customer is a district (county), the city to which the address said by the customer belongs is searched, and then the district (county) administrative level address in the text answered by the customer is replaced by the city to which it belongs. , to complete the text preprocessing.
  • Step S107 output the target address result.
  • the provided address recognition method based on named entity recognition includes: receiving question and answer audio data sent by an audio collection device; performing a voice recognition operation on the question and answer audio data to obtain question and answer text information; Text extraction operation to obtain address text information; input the address text information to the Embedding layer for vector conversion operation to obtain address text vector; input the question and answer text information and address text vector to the CNN model for feature expansion operation to obtain the expanded text vector; The address text vector and the expanded text vector are input to the trained named entity recognition model for entity recognition operation, and the target address result is obtained; the target address result is output.
  • the audio information is converted into text information and into a question and answer text vector, and the question and answer text vector is input into the CNN model.
  • the feature information is combined to obtain the extended text vector, and finally the question and answer text vector and the extended text vector are input into the trained named entity recognition model for named entity recognition, and the target address result is obtained. Since the extended text vector combines the token's following phrase features
  • the information and the feature information of the token enable the extended text vector to solve the generalization ability of the model for entity extraction in a specific range of suffixes without requiring a large amount of data for fitting, reducing model training costs and improving model recognition. ability.
  • FIG. 2 a flow chart of a specific implementation manner of step S103 in FIG. 1 is shown. For the convenience of description, only the parts related to the present application are shown.
  • step S103 specifically includes: step S201 and step S202.
  • Step S201 Perform word segmentation on the question and answer text information to obtain a plurality of words.
  • the method of word segmentation operation may be based on string matching, that is, scanning the string, and if a substring of the string is found to be the same as the word in the dictionary, it is considered a match, such as a mechanical word segmentation method.
  • string matching that is, scanning the string
  • This kind of word segmentation usually adds some heuristic rules, such as "forward/reverse maximum matching", "long word first” and so on.
  • the second category is the word segmentation method based on statistics and machine learning.
  • the model parameters are trained according to the observed data (labeled corpus), and in the word segmentation stage Then, the probability of occurrence of various word segmentations is calculated through the model, and the word segmentation result with the highest probability is used as the final result, and finally the address text information is obtained one by one.
  • the address text information may be a general term for all words, not necessarily the names of main words in the question and answer text information.
  • Step S202 Perform a filtering operation on the words based on the stop word table to obtain filtered address text information.
  • the address text information obtained after word segmentation can also be filtered according to the stop word table to remove some unimportant words (also called stop words), as an example , for example: "ah”, "oh”, etc.
  • FIG. 3 there is shown a flowchart of another specific implementation manner of step S103 in FIG. 1 .
  • step S103 in FIG. 1 .
  • FIG. 3 For the convenience of description, only the parts related to the present application are shown.
  • step S103 specifically includes: step S301 , step S302 , step S303 , step S304 and step S305 .
  • Step S301 Perform word segmentation on the question and answer text information to obtain a plurality of words.
  • the method of word segmentation operation may be based on string matching, that is, scanning the string, and if a substring of the string is found to be the same as the word in the dictionary, it is considered a match, such as a mechanical word segmentation method.
  • string matching that is, scanning the string
  • This kind of word segmentation usually adds some heuristic rules, such as "forward/reverse maximum matching", "long word first” and so on.
  • the second category is the word segmentation method based on statistics and machine learning.
  • the model parameters are trained according to the observed data (labeled corpus), and in the word segmentation stage Then, the probability of occurrence of various word segmentations is calculated through the model, and the word segmentation result with the highest probability is used as the final result, and finally the address text information is obtained one by one.
  • the address text information may be a general term for all words, not necessarily the names of main words in the question and answer text information.
  • Step S302 Perform a filtering operation on words based on the stop word table to obtain filtered words to be confirmed.
  • the address text information obtained after word segmentation can also be filtered according to the stop word table to remove some unimportant words (also called stop words), as an example , for example: "ah”, "oh”, etc.
  • Step S303 Calculate the first word frequency of each word to be confirmed in the question and answer text information.
  • the probability of it being a stop word is relatively high, and the first word frequency is mainly used to determine whether the word to be confirmed is a stop word.
  • Step S304 Read the local corpus, and calculate the second word frequency of each word to be confirmed in the local corpus.
  • K 2 is the second word frequency
  • n is the total number of documents in the corpus
  • m is the number of documents containing a certain word. If a word is more common, the size of K2 is closer to 0 , and the denominator is increased by 1 to avoid the denominator being 0, that is, all documents contain the word. It can be seen that if a word, such as "Anyway” appears in the input text, but its second word frequency is relatively low, it means that "Anyway” may be more important in the current input text, indicating that "Anyway” may be more important in the current input text. The word “line” is most likely the stem word in this input text.
  • Step S305 Filter the words to be confirmed according to the product of the first word frequency and the second word frequency to obtain address text information.
  • the words are filtered through regular expressions to obtain the words to be confirmed, and then the number of the words to be confirmed in the question and answer text information is calculated.
  • Word frequency after obtaining the second word frequency of the word to be confirmed in the corpus, and finally filtering the word to be confirmed according to the first word frequency and the second word frequency to obtain the filtered address text information.
  • FIG. 4 an implementation flowchart of obtaining a trained named entity recognition model provided by Embodiment 1 of the present application is shown. For the convenience of description, only parts related to the present application are shown.
  • step S106 before step S106, it further includes: step S401 and step S402.
  • Step S401 Obtain an initial training set and a data set to be identified.
  • the initial training set is a data set obtained by performing the following preprocessing on the labeling data set: the text in the labeling data set is segmented according to the sentence segmentation rules to obtain a plurality of sentences; Perform word segmentation on each sentence to get a sentence composed of multiple words, each word with a label; query the word dictionary and label dictionary to obtain the word ID and label ID of each word to convert the sentence into the form of word ID and label ID Representation; padding or truncating sentences to unify all sentences to a specified length.
  • the data set to be identified is a data set obtained by preprocessing the unlabeled data set as follows: the text in the unlabeled data set is segmented according to the sentence segmentation rules to obtain multiple sentences; each sentence is processed according to the preset vocabulary table. Word segmentation to get a sentence composed of multiple words; query the word dictionary to obtain the word ID of each word to convert the sentence into the form of word ID; fill or truncate the sentence to unify all sentences into a specified length.
  • the clauses can be segmented according to the clauses rules by using regular expressions to match.
  • Step S402 Perform multiple rounds of training operations on the initial named entity recognition model based on the initial training set and the data set to be identified until it converges to obtain a trained named entity recognition model, wherein each round of training operations includes: The initial named entity recognition model is supervised and trained to obtain the initial named entity recognition model after supervised training; the initial named entity recognition model after supervised training is based on the trained named entity recognition model.
  • the data set to be identified is extracted from the weakly labeled data set to be identified in this round, and the subset and the initial training set are formed into a training set for the next round of training.
  • the weak label of the named entity recognition model for the named entity labeling of the data set to be recognized during the training process is used as the labeling result of the data set to be recognized, and its subset and the initial training set are selected to form the training of the next round of training
  • the size of the data set to be recognized can be set as needed, so that the size of the training set used to train the named entity recognition model is expanded with the data set to be recognized of this size, so that the final named entity recognition model has better generalization capabilities.
  • the recognition ability is better on the data set to be recognized.
  • step S401 in FIG. 4 a flowchart of a specific implementation manner of step S401 in FIG. 4 is shown. For the convenience of description, only the parts related to the present application are shown.
  • step S401 specifically includes: step S501 , step S502 , step S503 , step S504 , step S505 , step S506 , step S507 , step S508 and step S509 .
  • Step S501 Read the local database, and obtain the pre-labeled data set and the unlabeled data set in the local database.
  • the initial training set is a data set obtained by performing the following preprocessing on the labeled data set;
  • the data set to be identified is a data set obtained by performing the following preprocessing on the unlabeled data set.
  • Step S502 Perform sentence segmentation on the text in the pre-labeled data set according to the sentence segmentation rules to obtain a plurality of pre-labeled sentences.
  • Step S503 Perform word segmentation on each pre-labeled sentence based on the preset word table, to obtain a pre-labeled sentence composed of multiple words, and each word carries label information respectively.
  • the word table may be a word table corresponding to the BERT model pre-trained by Google.
  • Step S504 query the word dictionary and the tag dictionary to obtain the word ID and label ID of each word to convert the pre-labeled sentence into a representation in the form of the word ID and the label ID.
  • the word dictionary and the label dictionary may be the word dictionary and the label dictionary corresponding to the BERT model pre-trained by Google.
  • Each word in the word dictionary has a corresponding word ID.
  • word IDs corresponding to unknown words are also set in the word dictionary. That is, if the word ID of a word is queried in the word dictionary, but the word is not recorded in the dictionary, the query feedback result is the word ID corresponding to the unknown word.
  • Each tag in the tag dictionary has a corresponding tag ID.
  • Step S505 unifying the length of the pre-labeled sentences to obtain an initial training set.
  • the length unification operation refers to filling or truncating sentences to a predetermined length
  • the predetermined length refers to the longest length of a predetermined sentence, which is generally set to 128, that is, the longest sentence contains 128 words. For example, if a sentence is less than 128 words, it will be filled with 0 at the end of the sentence to 128 words, and if it is more than 128 words, it will be truncated from the excess.
  • Step S506 Perform sentence segmentation on the text in the unlabeled data set according to the sentence segmentation rules to obtain a plurality of unlabeled sentences.
  • Step S507 Perform word segmentation on each unlabeled sentence based on the preset word table to obtain an unlabeled sentence composed of multiple words.
  • Step S508 Convert the unlabeled sentence into a word identification form based on the word dictionary.
  • Step S509 Unify the length of the unlabeled sentences to obtain the data set to be recognized.
  • step S402 in FIG. 4 a flow chart of a specific implementation manner of step S402 in FIG. 4 is shown. For the convenience of description, only the parts related to the present application are shown.
  • step S402 specifically includes: step S601 , step S602 , step S603 and step S604 .
  • Step S601 Input the sentences of the current round in the data set of the current round into the BERT layer of the BERT-CRF model in the named entity recognition model, and obtain the encoding vectors of the words in the sentences of the current round.
  • Step S602 Input the encoding vector into the CRF layer of the BERT-CRF model, and obtain a probability matrix of the current round of sentences composed of the probability sequences of all labels corresponding to all words in the current round of sentences.
  • Step S603 Obtain the optimal labeling sequence of the probability matrix of each current round of sentences based on the Viterbi algorithm.
  • Step S604 Obtain the identification label of the word according to the optimal labeling sequence, and adjust the parameters of the BERT-CRF model in the named entity recognition model based on the identification label of the word and the label of the word in the annotation data set.
  • the prior art uses the BERT layer + the fully connected layer to solve the sequence labeling problem.
  • the output of a single word The vector is then processed by Softmax, and the value of each dimension represents the probability that the word is a certain category. Based on this data, the loss can be calculated and the model can be trained.
  • the present invention replaces the fully connected layer with a CRF layer, and better captures the structural characteristics between tags through the BERT-CRF model.
  • the structure of the BERT-CRF model includes the BERT layer and the CRF layer that are connected in sequence.
  • the words (Word) in the sentence are input into the BERT layer to obtain the encoding vector, and the encoding vector is used as the input of the CRF layer to obtain the probability sequence of all labels corresponding to the words. Then, according to the probability matrix, the Viterbi algorithm is used for decoding to obtain the optimal labeling sequence, and the optimal labeling sequence contains the label (Label) corresponding to the word.
  • the address recognition method based on named entity recognition includes: receiving the question and answer audio data sent by the audio collection device; performing a voice recognition operation on the question and answer audio data to obtain the question and answer text information; extracting the address text from the question and answer text information operation to obtain the address text information; input the address text information to the Embedding layer for vector conversion operation to obtain the address text vector; input the question and answer text information and the address text vector to the CNN model for feature expansion operation to obtain the expanded text vector; convert the address text The vector and the expanded text vector are input to the trained named entity recognition model for entity recognition operation, and the target address result is obtained; the target address result is output.
  • the audio information is converted into text information and into a question and answer text vector, and the question and answer text vector is input into the CNN model.
  • the feature information is combined to obtain the extended text vector, and finally the question and answer text vector and the extended text vector are input into the trained named entity recognition model for named entity recognition, and the target address result is obtained, because the extended text vector combines the token's following phrase features.
  • the information and the feature information of the token enable the extended text vector to solve the generalization ability of the model for entity extraction in a specific range of suffixes without requiring a large amount of data for fitting, reducing model training costs and improving model recognition. ability.
  • the above question and answer audio data may also be stored in a node of a blockchain.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the present application may be used in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like.
  • the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
  • the present application provides an embodiment of an address identification device based on named entity recognition, and the device embodiment corresponds to the method embodiment shown in FIG. 1 .
  • the device can be specifically applied to various electronic devices.
  • the address recognition device 100 based on named entity recognition in this embodiment includes: an audio acquisition module 110, a speech recognition module 120, an address text extraction module 130, a vector conversion module 140, a feature expansion module 150, and an entity recognition module 160 and a result output module 170. in:
  • An audio acquisition module 110 configured to receive the question and answer audio data sent by the audio acquisition device
  • the speech recognition module 120 is used for performing speech recognition operation on the question and answer audio data to obtain the question and answer text information
  • the address text extraction module 130 is configured to perform an address text extraction operation on the question and answer text information to obtain the address text information
  • the vector conversion module 140 is used to input the address text information into the Embedding layer to perform a vector conversion operation to obtain an address text vector;
  • the feature expansion module 150 is used to input the question and answer text information and the address text vector into the CNN model for feature expansion operation to obtain the expanded text vector;
  • the entity recognition module 160 is used to input the address text vector and the expanded text vector into the trained named entity recognition model to perform entity recognition operation, and obtain the target address result;
  • the result output module 170 is used for outputting the target address result.
  • the question and answer audio data refers to a waveform file that converts an audio signal during a phone call into a waveform signal.
  • the question and answer audio data can be obtained by importing audio signals collected by a microphone, a telephone or other equipment into the computer through a digital audio interface in the computer for recording.
  • the speech recognition operation is mainly used to convert the above-mentioned collected question and answer audio data into text data.
  • the speech recognition operation can be realized by a pattern matching method.
  • Each word is said in turn, and its feature vector is stored in the template library as a template.
  • the recognition stage the similarity between the feature vector of the input speech and each template in the template library is compared in turn, and the one with the highest similarity is used as the identification. result output.
  • the question and answer text information can be distinguished from the question and answer text information recognized by the voice according to the waveform characteristics of the user, and the text content information is displayed in the form of "one question and one answer", so as to obtain the question and answer of the customer service staff.
  • Text information and user's Q&A text information are displayed in the form of "one question and one answer", so as to obtain the question and answer of the customer service staff.
  • the address text extraction operation may be a word segmentation operation on the question and answer text information to obtain a plurality of words, and a filtering operation on the words based on the stop word table to obtain the filtered address text information.
  • the address text extraction operation may also be a word segmentation operation on the question and answer text information to obtain a plurality of words, and a filtering operation is performed on the words based on the stop word table to obtain the filtered words to be confirmed, and calculate each word to be confirmed.
  • the first word frequency of words in the question and answer text information read the local corpus, calculate the second word frequency of each word to be confirmed in the local corpus, filter the words to be confirmed according to the product of the first word frequency and the second word frequency, and obtain the address text information .
  • the vector transformation operation refers to inputting the question and answer text information into the Embedding layer for vector transformation to obtain the question and answer text vector.
  • the CNN performs expansion processing on the obtained question and answer text vector through a sliding window, that is, adding contextual feature information to obtain an expanded text vector with contextual feature information expanded.
  • the expanded text vector with the expanded contextual feature information is combined with the original question-and-answer text vector and input into the trained named entity recognition model, and the expanded text vector obtained by combining with the CNN model and the vector conversion
  • the obtained question and answer text vector increases the feature information of the context and improves the generalization ability of the trained named entity recognition model for entity extraction in a specific range of suffixes, especially long-tail address entities (such as: *** Mongolia Autonomous County ), because through the CNN sliding window, more context information of a long tail suffix such as "Mongolia Autonomous County" can be given to the downstream network layer for model parameter learning and improve the generalization ability of the model.
  • the *** area answered by the customer is extracted through the NER model, and then indexed through the national address database, and the address is retrieved through fuzzy matching of words and sounds, and it is judged whether the address said by the customer actually exists or not. It belongs to the administrative level. If the administrative level of the address mentioned by the customer is a district (county), the city to which the address said by the customer belongs is searched, and then the district (county) administrative level address in the text answered by the customer is replaced by the city to which it belongs. , to complete the text preprocessing.
  • the provided address recognition device based on named entity recognition includes: an audio acquisition module for receiving question-and-answer audio data sent by an audio collection device; a speech recognition module for performing a speech recognition operation on the question-and-answer audio data , to obtain the question and answer text information; the address text extraction module is used to extract the address text information from the question and answer text information to obtain the address text information; the vector conversion module is used to input the address text information into the Embedding layer for vector transformation operation to obtain the address text Vector; feature expansion module, which is used to input the question and answer text information and address text vector into the CNN model for feature expansion operation to obtain the expanded text vector; entity recognition module, which is used to input the address text vector and the expanded text vector into the trained name
  • the entity recognition model performs the entity recognition operation to obtain the target address result; the result output module is used to output the target address result.
  • the audio information is converted into text information and into a question and answer text vector, and the question and answer text vector is input into the CNN model.
  • the feature information is combined to obtain the extended text vector, and finally the question and answer text vector and the extended text vector are input into the trained named entity recognition model for named entity recognition, and the target address result is obtained, because the extended text vector combines the token's following phrase features.
  • the information and the feature information of the token enable the extended text vector to solve the generalization ability of the model for entity extraction in a specific range of suffixes without requiring a large amount of data for fitting, reducing model training costs and improving model recognition. ability.
  • FIG. 8 a schematic structural diagram of a specific implementation manner of the address text extraction module 130 in FIG. 7 is shown. For the convenience of description, only the parts related to the present application are shown.
  • the foregoing address text extraction module 130 includes: a first word segmentation submodule 131 and a first filtering submodule 132 . in:
  • the first word segmentation sub-module 131 is used to perform word segmentation operation on the question and answer text information to obtain a plurality of words;
  • the first filtering sub-module 132 is configured to perform a filtering operation on words based on the stop word table to obtain filtered address text information.
  • the method of word segmentation operation may be based on string matching, that is, scanning the string, and if a substring of the string is found to be the same as the word in the dictionary, it is considered a match, such as a mechanical word segmentation method.
  • string matching that is, scanning the string
  • This kind of word segmentation usually adds some heuristic rules, such as "forward/reverse maximum matching", "long word first” and so on.
  • the second category is the word segmentation method based on statistics and machine learning.
  • the model parameters are trained according to the observed data (labeled corpus), and in the word segmentation stage Then, the probability of occurrence of various word segmentations is calculated through the model, and the word segmentation result with the highest probability is used as the final result, and finally the address text information is obtained one by one.
  • the address text information may be a general term for all words, not necessarily the names of main words in the question and answer text information.
  • the address text information obtained after word segmentation can also be filtered according to the stop word table to remove some unimportant words (also called stop words), as an example , for example: "ah”, "oh”, etc.
  • the address text extraction module 130 includes: a second word segmentation submodule, a second filtering submodule, a first word frequency calculation submodule, a second word frequency calculation submodule, and a third filter submodule. in:
  • the second word segmentation sub-module is used to perform word segmentation operation on the question and answer text information to obtain multiple words
  • the second filtering sub-module is used to perform a filtering operation on words based on the stop word table to obtain filtered words to be confirmed;
  • the first word frequency calculation submodule is used to calculate the first word frequency of each word to be confirmed in the question and answer text information
  • the second word frequency calculation submodule is used to read the local corpus, and calculate the second word frequency of each word to be confirmed in the local corpus;
  • the third filtering sub-module is configured to filter the words to be confirmed according to the product of the first word frequency and the second word frequency to obtain address text information.
  • the above-mentioned address recognition apparatus 100 based on named entity recognition further includes: a training data acquisition module and a multi-round training module. in:
  • the training data acquisition module is used to acquire the initial training set and the data set to be identified;
  • the multi-round training module is used to perform multiple rounds of training operations on the initial named entity recognition model based on the initial training set and the data set to be recognized until it converges to obtain a trained named entity recognition model; wherein, each round of training operations includes: based on this
  • the initial named entity recognition model after supervised training is obtained by supervised training of the initial named entity recognition model in the round training set.
  • Data set extract a subset from the weakly labeled to-be-identified data set obtained in this round, and combine the subset and the initial training set into a training set for the next round of training.
  • the above training data acquisition module includes: a training data acquisition submodule, a first sentence segmentation module, a third word segmentation submodule, a first sentence conversion submodule, and a first length unification submodule module, the second sentence segmentation module, the fourth word segmentation submodule, the second sentence conversion submodule, and the second length unification submodule. in:
  • the training data acquisition sub-module is used to read the local database, and obtain the pre-labeled data set and unlabeled data set in the local database;
  • the first sentence segmentation module is used to perform sentence segmentation operations on the text in the pre-labeled data set according to the sentence segmentation rules to obtain multiple pre-labeled sentences;
  • the third word segmentation sub-module is used to perform word segmentation operation on each pre-labeled sentence based on the preset word table, and obtain a pre-labeled sentence composed of multiple words, each of which has label information;
  • the first sentence conversion sub-module is used to query the word dictionary and the tag dictionary to obtain the word ID and label ID of each word, so as to convert the pre-labeled sentence into a representation in the form of word ID and label ID;
  • the first length unification sub-module is used to unify the length of the pre-labeled sentences to obtain the initial training set;
  • the second sentence segmentation module is used to segment the text in the unlabeled data set according to the sentence segmentation rules to obtain multiple unlabeled sentences;
  • the fourth word segmentation sub-module is used to perform word segmentation operation on each unlabeled sentence based on the preset word table, so as to obtain an unlabeled sentence composed of multiple words;
  • the second sentence conversion sub-module is used to convert the unlabeled sentence into the form of word identification based on the word dictionary;
  • the second length unification sub-module is used to perform length unification operation on unlabeled sentences to obtain the data set to be recognized.
  • the above-mentioned multi-round training module specifically includes: a data input submodule, a probability matrix composition submodule, an optimal sequence acquisition submodule, and a parameter adjustment submodule. in:
  • the data input sub-module is used to input the sentences in the current round of the data set into the BERT layer of the BERT-CRF model in the named entity recognition model, and obtain the encoding vectors of the words in the sentences in this round;
  • the probability matrix is composed of sub-modules, which are used to input the encoding vector into the CRF layer of the BERT-CRF model, and obtain the probability matrix of this round of sentences composed of the probability sequences of all tags corresponding to all words in this round of sentences;
  • the optimal sequence obtaining sub-module is used to obtain the optimal labeling sequence of the probability matrix of each current round of sentences based on the Viterbi algorithm;
  • the parameter adjustment sub-module is used to obtain the identification label of the word according to the optimal labeling sequence, and adjust the parameters of the BERT-CRF model in the named entity recognition model based on the identification label of the word and the label of the word in the annotation data set.
  • the address recognition device based on named entity recognition includes: an audio acquisition module for receiving the question and answer audio data sent by the audio acquisition device; a speech recognition module for performing a voice recognition operation on the question and answer audio data to obtain Question and answer text information; the address text extraction module is used to extract the address text information from the question and answer text information to obtain the address text information; the vector conversion module is used to input the address text information into the Embedding layer for vector conversion operation to obtain the address text vector; The feature expansion module is used to input the question and answer text information and the address text vector into the CNN model for feature expansion operation to obtain the expanded text vector; the entity recognition module is used to input the address text vector and the expanded text vector to the trained named entity recognition.
  • the model performs entity recognition operation to obtain the target address result; the result output module is used to output the target address result.
  • the audio information is converted into text information and into a question and answer text vector, and the question and answer text vector is input into the CNN model.
  • the feature information is combined to obtain the extended text vector, and finally the question and answer text vector and the extended text vector are input into the trained named entity recognition model for named entity recognition, and the target address result is obtained, because the extended text vector combines the token's following phrase features
  • the information and the feature information of the token enable the extended text vector to solve the generalization ability of the model for entity extraction in a specific range of suffixes without requiring a large amount of data for fitting, reducing model training costs and improving model recognition. ability.
  • FIG. 9 is a block diagram of the basic structure of a computer device according to this embodiment.
  • the computer device 200 includes a memory 210 , a processor 220 , and a network interface 230 that communicate with each other through a system bus. It should be noted that only the computer device 200 with components 210-230 is shown in the figure, but it should be understood that implementation of all of the shown components is not required, and more or less components may be implemented instead.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded equipment etc.
  • the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
  • the memory 210 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the computer readable storage Media can be non-volatile or volatile.
  • the memory 210 may be an internal storage unit of the computer device 200 , such as a hard disk or a memory of the computer device 200 .
  • the memory 210 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 210 may also include both the internal storage unit of the computer device 200 and its external storage device.
  • the memory 210 is generally used to store the operating system and various application software installed on the computer device 200 , such as computer-readable instructions based on an address recognition method based on named entity recognition.
  • the memory 210 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 220 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 220 is typically used to control the overall operation of the computer device 200 .
  • the processor 220 is configured to execute computer-readable instructions stored in the memory 210 or process data, for example, computer-readable instructions for executing the address identification method based on named entity identification.
  • the network interface 230 may include a wireless network interface or a wired network interface, and the network interface 230 is generally used to establish a communication connection between the computer device 200 and other electronic devices.
  • the address recognition method based on named entity recognition provided by this application, in the process of man-machine question and answer, after obtaining the audio information of the user's reply, the audio information is converted into text information and into a question and answer text vector, and the question and answer text vector is input
  • the CNN model combines the feature information of the following phrases of the token with the feature information of the token to obtain the expanded text vector.
  • the question and answer text vector and the expanded text vector are input into the trained named entity recognition model for named entity recognition, and the target address is obtained.
  • the extended text vector since the extended text vector combines the feature information of the token's following phrases and the feature information of the token, the extended text vector can solve the generalization ability of the model for entity extraction in a specific range of suffixes without requiring a large amount of data for fitting. , which reduces the cost of model training and improves the recognition ability of the model.
  • the present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to perform the steps of the method for address identification based on named entity identification as described above.
  • the address recognition method based on named entity recognition provided by this application, in the process of man-machine question and answer, after obtaining the audio information of the user's reply, the audio information is converted into text information and into a question and answer text vector, and the question and answer text vector is input
  • the CNN model combines the feature information of the following phrases of the token with the feature information of the token to obtain the expanded text vector.
  • the question and answer text vector and the expanded text vector are input into the trained named entity recognition model for named entity recognition, and the target address is obtained.
  • the extended text vector since the extended text vector combines the feature information of the token's following phrases and the feature information of the token, the extended text vector can solve the generalization ability of the model for entity extraction in a specific range of suffixes without requiring a large amount of data for fitting. , which reduces the cost of model training and improves the recognition ability of the model.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
  • a storage medium such as ROM/RAM, magnetic disk, CD-ROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Molecular Biology (AREA)
  • Acoustics & Sound (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé et un dispositif de reconnaissance d'adresse basés sur la reconnaissance d'entité nommée, ainsi qu'un dispositif informatique et un support de stockage, qui se rapportent au domaine technique du traitement de la parole dans l'intelligence artificielle, ainsi qu'à la technologie des chaînes de blocs, ce qui permet de stocker les données audio des questions et réponses d'un utilisateur dans une chaîne de blocs. Comme un vecteur de texte augmenté est combiné avec des informations de caractéristiques de phrases subséquentes d'un jeton et des informations de caractéristiques du jeton, le procédé permet au vecteur de texte augmenté de résoudre la capacité de généralisation d'un modèle pour l'extraction d'entités dans les suffixes d'une plage spécifique, et élimine le besoin d'un grand volume de données pour l'ajustement, ce qui permet de réduire les coûts d'apprentissage de modèle et d'augmenter la capacité de reconnaissance de modèle dans le même temps.
PCT/CN2021/090433 2020-12-30 2021-04-28 Procédé et dispositif de reconnaissance d'adresse, et support de stockage WO2022142011A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011609093.X 2020-12-30
CN202011609093.XA CN112633003B (zh) 2020-12-30 2020-12-30 一种地址识别方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022142011A1 true WO2022142011A1 (fr) 2022-07-07

Family

ID=75286641

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090433 WO2022142011A1 (fr) 2020-12-30 2021-04-28 Procédé et dispositif de reconnaissance d'adresse, et support de stockage

Country Status (2)

Country Link
CN (1) CN112633003B (fr)
WO (1) WO2022142011A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081449A (zh) * 2022-08-23 2022-09-20 北京睿企信息科技有限公司 一种地址识别方法及系统
CN116991983A (zh) * 2023-09-27 2023-11-03 之江实验室 一种面向公司资讯文本的事件抽取方法及系统
CN117992600A (zh) * 2024-04-07 2024-05-07 之江实验室 一种业务执行方法、装置、存储介质以及电子设备

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633003B (zh) * 2020-12-30 2024-05-31 平安科技(深圳)有限公司 一种地址识别方法、装置、计算机设备及存储介质
CN113254639A (zh) * 2021-05-24 2021-08-13 珠海大横琴科技发展有限公司 一种监控报警定位跟踪方法、装置及电子设备
CN113539270B (zh) * 2021-07-22 2024-04-02 阳光保险集团股份有限公司 一种位置识别方法、装置、电子设备和存储介质
CN113535880B (zh) * 2021-09-16 2022-02-25 阿里巴巴达摩院(杭州)科技有限公司 地理信息确定方法、装置、电子设备及计算机存储介质
CN113836920A (zh) * 2021-10-19 2021-12-24 平安普惠企业管理有限公司 地址信息的识别方法、装置、计算机设备及存储介质
CN116050402B (zh) * 2022-05-23 2023-10-20 荣耀终端有限公司 文本地址识别方法、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083831A (zh) * 2019-04-16 2019-08-02 武汉大学 一种基于BERT-BiGRU-CRF的中文命名实体识别方法
US20190318737A1 (en) * 2013-03-14 2019-10-17 Amazon Technologies, Inc. Dynamic gazetteers for personalized entity recognition
CN110377686A (zh) * 2019-07-04 2019-10-25 浙江大学 一种基于深度神经网络模型的地址信息特征抽取方法
CN110442856A (zh) * 2019-06-14 2019-11-12 平安科技(深圳)有限公司 一种地址信息标准化方法、装置、计算机设备及存储介质
CN111738004A (zh) * 2020-06-16 2020-10-02 中国科学院计算技术研究所 一种命名实体识别模型的训练方法及命名实体识别的方法
CN111933129A (zh) * 2020-09-11 2020-11-13 腾讯科技(深圳)有限公司 音频处理方法、语言模型的训练方法、装置及计算机设备
CN112633003A (zh) * 2020-12-30 2021-04-09 平安科技(深圳)有限公司 一种地址识别方法、装置、计算机设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060047500A1 (en) * 2004-08-31 2006-03-02 Microsoft Corporation Named entity recognition using compiler methods
CN103440311A (zh) * 2013-08-27 2013-12-11 深圳市华傲数据技术有限公司 一种地名实体识别的方法及系统
CN109299469B (zh) * 2018-10-29 2023-05-02 复旦大学 一种在长文本中识别复杂住址的方法
CN110287479B (zh) * 2019-05-20 2022-07-22 平安科技(深圳)有限公司 命名实体识别方法、电子装置及存储介质
CN111950287B (zh) * 2020-08-20 2024-04-23 广东工业大学 一种基于文本的实体识别方法及相关装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190318737A1 (en) * 2013-03-14 2019-10-17 Amazon Technologies, Inc. Dynamic gazetteers for personalized entity recognition
CN110083831A (zh) * 2019-04-16 2019-08-02 武汉大学 一种基于BERT-BiGRU-CRF的中文命名实体识别方法
CN110442856A (zh) * 2019-06-14 2019-11-12 平安科技(深圳)有限公司 一种地址信息标准化方法、装置、计算机设备及存储介质
CN110377686A (zh) * 2019-07-04 2019-10-25 浙江大学 一种基于深度神经网络模型的地址信息特征抽取方法
CN111738004A (zh) * 2020-06-16 2020-10-02 中国科学院计算技术研究所 一种命名实体识别模型的训练方法及命名实体识别的方法
CN111933129A (zh) * 2020-09-11 2020-11-13 腾讯科技(深圳)有限公司 音频处理方法、语言模型的训练方法、装置及计算机设备
CN112633003A (zh) * 2020-12-30 2021-04-09 平安科技(深圳)有限公司 一种地址识别方法、装置、计算机设备及存储介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081449A (zh) * 2022-08-23 2022-09-20 北京睿企信息科技有限公司 一种地址识别方法及系统
CN115081449B (zh) * 2022-08-23 2022-11-04 北京睿企信息科技有限公司 一种地址识别方法及系统
CN116991983A (zh) * 2023-09-27 2023-11-03 之江实验室 一种面向公司资讯文本的事件抽取方法及系统
CN116991983B (zh) * 2023-09-27 2024-02-02 之江实验室 一种面向公司资讯文本的事件抽取方法及系统
CN117992600A (zh) * 2024-04-07 2024-05-07 之江实验室 一种业务执行方法、装置、存储介质以及电子设备
CN117992600B (zh) * 2024-04-07 2024-06-11 之江实验室 一种业务执行方法、装置、存储介质以及电子设备

Also Published As

Publication number Publication date
CN112633003B (zh) 2024-05-31
CN112633003A (zh) 2021-04-09

Similar Documents

Publication Publication Date Title
WO2022142011A1 (fr) Procédé et dispositif de reconnaissance d'adresse, et support de stockage
WO2021135910A1 (fr) Procédé d'extraction d'informations basé sur la compréhension de lecture de machine et dispositif associé
WO2022088672A1 (fr) Procédé et appareil de compréhension de lecture de machine basés sur bert, dispositif, et support de stockage
WO2020224219A1 (fr) Procédé et appareil de segmentation de mot chinois, dispositif électronique et support de stockage lisible
US10755048B2 (en) Artificial intelligence based method and apparatus for segmenting sentence
WO2021121198A1 (fr) Procédé et appareil d'extraction de relation d'entité basée sur une similitude sémantique, dispositif et support
WO2021135469A1 (fr) Procédé, appareil, dispositif informatique et support d'extraction d'informations basée sur l'apprentissage automatique
WO2021218028A1 (fr) Procédé, appareil et dispositif d'affinage de contenu d'entretien basé sur l'intelligence artificielle, et support
CN112328761B (zh) 一种意图标签设置方法、装置、计算机设备及存储介质
CN112287069B (zh) 基于语音语义的信息检索方法、装置及计算机设备
US20220147835A1 (en) Knowledge graph construction system and knowledge graph construction method
CN111782763A (zh) 基于语音语义的信息检索方法、及其相关设备
CN111783471B (zh) 自然语言的语义识别方法、装置、设备及存储介质
CN112836521A (zh) 问答匹配方法、装置、计算机设备及存储介质
CN115099239B (zh) 一种资源识别方法、装置、设备以及存储介质
CN111126084B (zh) 数据处理方法、装置、电子设备和存储介质
WO2021139076A1 (fr) Procédé et appareil de génération de dialogue de texte intelligent, et support d'informations lisible par ordinateur
CN112084779A (zh) 用于语义识别的实体获取方法、装置、设备及存储介质
WO2022073341A1 (fr) Procédé et appareil de mise en correspondance d'entités de maladie fondés sur la sémantique vocale, et dispositif informatique
WO2023092719A1 (fr) Procédé d'extraction d'informations pour des données de dossier médical, et dispositif de terminal et support de stockage lisible
CN113505595A (zh) 文本短语抽取方法、装置、计算机设备及存储介质
CN113987162A (zh) 文本摘要的生成方法、装置及计算机设备
CN112632956A (zh) 文本匹配方法、装置、终端和存储介质
CN112417875A (zh) 配置信息的更新方法、装置、计算机设备及介质
CN116796730A (zh) 基于人工智能的文本纠错方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21912773

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21912773

Country of ref document: EP

Kind code of ref document: A1