CN115757814A - Corpus expansion writing method, apparatus, device and storage medium for speech recognition - Google Patents

Corpus expansion writing method, apparatus, device and storage medium for speech recognition Download PDF

Info

Publication number
CN115757814A
CN115757814A CN202211367688.8A CN202211367688A CN115757814A CN 115757814 A CN115757814 A CN 115757814A CN 202211367688 A CN202211367688 A CN 202211367688A CN 115757814 A CN115757814 A CN 115757814A
Authority
CN
China
Prior art keywords
text
corpus
word
speech
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211367688.8A
Other languages
Chinese (zh)
Inventor
黄姿荣
刘俊峰
张莹
汪华锋
彭璐
甄磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinghe Zhilian Automobile Technology Co Ltd
Original Assignee
Xinghe Zhilian Automobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinghe Zhilian Automobile Technology Co Ltd filed Critical Xinghe Zhilian Automobile Technology Co Ltd
Priority to CN202211367688.8A priority Critical patent/CN115757814A/en
Publication of CN115757814A publication Critical patent/CN115757814A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a corpus expanding writing method, a corpus expanding writing device, equipment and a storage medium for voice recognition.A maximum forward matching algorithm is used for segmenting words of a collected corpus to be expanded, determining the part of speech of the segmented words in the text, and determining the pre-marked part of speech of the segmented words as an entity according to a preset context collocation rule; identifying the pre-built scene identification model in the corpus to be expanded and written, and judging the text with two entities in the text as an in-set text; searching a parent node in a pre-established knowledge graph spectrum through an aggregation calculation to establish a child node, and integrating entities of texts in the aggregation into the knowledge graph; and synchronizing the newly added nodes in the knowledge graph into a voice recognition model through an agreement generation rule to become a word slot to take effect as a voice bottom-finding strategy. The corpus expansion writing method can autonomously expand and write the corpus of the voice recognition, and greatly improve corpus expansion writing efficiency and precision.

Description

Corpus expanding and writing method, apparatus, equipment and storage medium for speech recognition
Technical Field
The invention relates to the technical field of intelligent perception, in particular to a corpus expanding and writing method, a corpus expanding and writing device, corpus expanding and writing equipment and a storage medium for voice recognition.
Background
Voice interaction becomes an important man-machine interaction mode in the current intelligent cabin, but the efficiency of semantic recognition is lower in the use process of user voice interaction, and the method is mainly embodied in two aspects: the first is a rejection, resulting in no response or "i am not understand", and the second is a deviation of the execution from the original meaning, e.g., "open rear window", resulting in all windows being opened. The main reasons for these two aspects are insufficient semantic scene coverage and model corpus deficiency. The current solution relies on personnel to label the rejected corpora and expand the corpora to improve the semantic understanding effect. However, the efficiency and precision of the manual corpus expansion writing scheme are low.
Disclosure of Invention
In order to solve the above problems, the present invention provides a corpus expansion writing method, device, equipment and storage medium for speech recognition, which can autonomously expand and write the corpus of speech recognition, thereby greatly improving corpus expansion writing efficiency and precision.
The embodiment of the invention provides a corpus expanding and writing method for voice recognition, which comprises the following steps:
segmenting the texts in the collected linguistic data to be expanded through a maximum forward matching algorithm, determining the part of speech of the segmented words in the texts, and determining the pre-marked part of speech in the segmented words as an entity according to a preset context collocation rule;
identifying the pre-built scene identification model in the corpus to be expanded and written, and judging the text with two entities in the text as an in-set text;
establishing child nodes in the parent nodes retrieved from the pre-established knowledge graph spectrum through set aggregation calculation, and integrating the entities of the texts in the set into the knowledge graph;
and synchronizing the newly added nodes in the knowledge graph into a voice recognition model through an agreement generation rule to become a word slot, and taking the word slot into effect as a voice bottom-finding strategy.
Preferably, the segmenting words of the collected texts in the corpus to be expanded by the maximum forward matching algorithm specifically includes:
s201, taking a preset character string with the maximum matching character length from front to back in the text in the corpus to be expanded and written;
s202, judging whether the character string exists in a preset corpus or not;
when not present, step S203 is performed;
if yes, the character string is successfully segmented, and the step S205 is returned;
s203, deleting the last character of the character string when the character string fails to be segmented;
s204, judging whether the length of the character string is 0 or not;
if yes, the present character string with the maximum matching character length fails to be segmented, and step S205 is executed;
if not, returning to the step S202;
s205, judging whether the length of the remaining characters of the text in the corpus to be expanded and written after the word segmentation is 0;
if yes, ending word segmentation judgment;
if not, the word segmentation is not finished, and step S206 is executed;
s206, judging whether the remaining character length of the text in the corpus to be expanded and written after the word segmentation is not more than the maximum matching character length;
if yes, taking all the remaining characters as the obtained character string, and returning to the step 202;
if not, in all the remaining characters, the character string with the maximum matching character length is taken again from front to back, and the step S202 is returned.
Preferably, the corpus to be expanded includes text data that cannot be recognized by a preset semantic model library during speech recognition, text data with an instruction execution error during speech recognition, text data that is determined as a repeat instruction during speech recognition, and text data that is determined as negative by a preset emotion analysis model during speech recognition;
the determining the part of speech of the participle in the text specifically comprises:
calculating a first probability P that the word T after word segmentation is divided into nouns T (n)=C T,n /M, and a second probability P divided into verbs T (v)=C T,v /M;
When the first probability is not less than the second probability and the first probability is not less than a preset word segmentation probability, determining a word T as a noun;
when the second probability is not less than the first probability and the second probability is not less than a preset word segmentation probability, determining the word T as a verb;
wherein M is the total number of occurrences of the word T in the preset training corpus, and C T,n A first number of times that the word T in the training corpus is marked as a noun by a preset part-of-speech tagging model, C T,v And marking the word T in the training corpus as a second time of the verb by the part-of-speech tagging model.
As a preferred scheme, the creating child nodes in the parent nodes retrieved from the pre-created knowledge graph by aggregation calculation, and integrating the entities of the texts in the set into the knowledge graph specifically include:
performing node retrieval on two entities of the text in the set through set aggregation calculation in the knowledge graph;
and taking the searched node path as a father node as a scene layer or an intention layer of the knowledge graph, taking the other node as a child node as a word slot layer of the knowledge graph, and completing the integration of the entities of the texts in the set.
Preferably, the method further comprises:
adding an extra-set root node in the knowledge graph to the extra-set text which is not determined as the intra-set text in the corpus to be expanded and written;
traversing the newly added out-of-set root node, and judging whether an entity exists under the root node in the out-of-set text;
if the entity exists, the root node is not newly added to the entity of the text outside the set, and other entities in the text outside the set become child nodes;
when the entities of the text in the set do not have the root node, taking the entity at the front of the weight sequence in the text as a father node and taking the entity at the back as a child node;
the weight sequence is specifically a sequence obtained by counting the occurrence times of all entities in the text outside the set and sorting according to the counted number.
As a preferred scheme, the synchronizing, by using an agreement generation rule, the newly added node in the knowledge graph into a speech recognition model, and taking the node as a word slot into effect, as a speech bottom-finding policy, specifically includes:
calling a Chinese-English translation interface, carrying out English translation on the node entity, and packaging into a data structure;
synchronizing the data structure packed by the text and the protocol field of the text into the voice recognition model for training, enabling the newly added node entity to become a word slot to take effect and become a feature in the voice recognition model, and when similar voice dialects are recognized, taking the word slot as a parameter of the voice pocket bottom text to perform corresponding instruction control.
Preferably, after determining the entity, the method further comprises:
acquiring a search hot word in a first preset period from a search website, acquiring a network topic and a hot word in the first preset period from a mainstream network platform, and taking the acquired search hot word, network topic and hot word as training data;
performing weight kicking processing on the obtained training data through a neural network model, performing service-related judgment on the training data subjected to weight kicking processing, and removing the training data which are not related to the service;
and marking the part of speech of the training data without irrelevant data, and adding the part of speech into the part of speech marking model for training.
Another embodiment of the present invention provides a corpus expanding and writing device for speech recognition, including:
the analysis module is used for segmenting the texts in the collected linguistic data to be expanded through a maximum forward matching algorithm, determining the part of speech of the segmented words in the texts, and determining the pre-marked part of speech of the segmented words as an entity according to a preset context matching rule;
the recognition module is used for recognizing the pre-built scene recognition model in the corpus to be expanded and written, and determining the text with two entities in the text as an in-set text;
the first integration module is used for building child nodes in the parent nodes retrieved from the pre-built knowledge graph spectrum through aggregation calculation and integrating the entities of the texts in the set into the knowledge graph;
and the validation module is used for synchronizing the newly added nodes in the knowledge graph into the voice recognition model through the protocol generation rule to become word slots to be validated as the voice bottom-finding strategy.
As a preferred scheme, the parsing module is specifically configured to:
s201, taking a preset character string with the maximum matching character length from front to back in the text in the corpus to be expanded;
s202, judging whether the character string exists in a preset corpus or not;
when not present, step S203 is performed;
if yes, the character string is successfully segmented, and the step S205 is returned;
s203, deleting the last character of the character string when the character string fails to be segmented;
s204, judging whether the length of the character string is 0 or not;
if yes, the present character string with the maximum matching character length fails to be segmented, and step S205 is executed;
if not, returning to the step S202;
s205, judging whether the remaining character length of the text in the corpus to be expanded and written after the word segmentation is 0;
if yes, ending word segmentation judgment;
if not, the word segmentation is not finished, and step S206 is executed;
s206, judging whether the remaining character length of the text in the corpus to be expanded and written after the word segmentation is not more than the maximum matching character length;
if yes, taking all the remaining characters as the obtained character string, and returning to the step 202;
if not, the character string with the maximum matching character length is taken again from front to back in all the remaining characters, and the step S202 is returned.
Preferably, the corpus to be expanded includes text data that cannot be recognized by a preset semantic model library during speech recognition, text data with an instruction execution error during speech recognition, text data that is determined as a repeat instruction during speech recognition, and text data that is determined as negative by a preset emotion analysis model during speech recognition;
the parsing module is further configured to:
calculating the first probability P of the word T after word segmentation being divided into nouns T (n)=C T,n M, and a second probability P divided into verbs T (v)=C T,v /M;
When the first probability is not less than the second probability and the first probability is not less than a preset word segmentation probability, determining a word T as a noun;
when the second probability is not less than the first probability and not less than a preset word segmentation probability, determining a word T as a verb;
wherein M is a preset training phraseTotal number of occurrences of the word T in the material, C T,n A first number of times that the word T in the training corpus is marked as a noun by a preset part-of-speech tagging model, C T,v And marking the word T in the training corpus as a second time of a verb by the part-of-speech tagging model.
Preferably, the first integration module is specifically configured to:
performing node retrieval on two entities of the text in the set through set aggregation calculation in the knowledge graph;
and taking the searched node path as a father node as a scene layer or an intention layer of the knowledge graph, taking the other node as a child node as a word slot layer of the knowledge graph, and completing the integration of the entities of the texts in the set.
Preferably, the apparatus further comprises:
a second integration module specifically configured to:
adding an extra-set root node in the knowledge graph to the extra-set text which is not determined as the intra-set text in the corpus to be expanded and written;
traversing the newly added out-of-set root node, and judging whether an entity exists under the root node in the out-of-set text;
if the entity exists, the root node is not newly added to the entity of the text outside the set, and other entities in the text outside the set become child nodes;
when the entities in the text in the set do not have root nodes, taking the entity at the front of the weight sequence in the text as a father node and taking the entity at the back as a child node;
the weight sequence is specifically a sequence obtained by counting the occurrence times of all entities in the text outside the set and sorting according to the counted number.
Preferably, the validation module is specifically configured to:
calling a Chinese-English translation interface, carrying out English translation on the node entity, and packaging into a data structure;
and synchronizing the data structure packed by the text and the protocol field of the text into the voice recognition model for training, enabling the newly added node entity to become a word slot to take effect and become a feature in the voice recognition model, and when similar voice dialogs are recognized, taking the word slot as a parameter of the voice pocket bottom text to perform corresponding instruction control.
Preferably, the apparatus further comprises:
the quality inspection module is specifically used for:
acquiring a search hot word in a first preset period from a search website, acquiring a network topic and a hot word in the first preset period from a mainstream network platform, and taking the acquired search hot word, network topic and hot word as training data;
performing kicking weight processing on the obtained training data through a neural network model, performing service-related judgment on the training data subjected to kicking weight processing, and removing the training data which are not related to the service;
and marking the part of speech of the training data without irrelevant data, and adding the part of speech into the part of speech marking model for training.
The embodiment of the present invention further provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the processor implements the corpus expansion method for speech recognition according to any one of the above embodiments.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, a device in which the computer-readable storage medium is located is controlled to execute the corpus expansion method for speech recognition according to any one of the above embodiments.
The invention provides a corpus expanding and writing method, a corpus expanding and writing device, equipment and a storage medium for voice recognition, wherein the method comprises the steps of segmenting words of a collected corpus to be expanded and written through a maximum forward matching algorithm, determining the part of speech of the segmented words in the text, and determining the pre-marked part of speech of the segmented words as an entity according to a preset context collocation rule; identifying the pre-built scene identification model in the corpus to be expanded and written, and judging the text with two entities in the text as the text in the set; searching a parent node in a pre-established knowledge graph spectrum through an aggregation calculation to establish a child node, and integrating entities of texts in the aggregation into the knowledge graph; and synchronizing the newly added nodes in the knowledge graph into a voice recognition model through an agreement generation rule to become a word slot to take effect as a voice bottom-finding strategy. Determining the text in the corpus by word segmentation analysis and label division of the corpus to be expanded, integrating the text in the corpus into a knowledge graph of the speech recognition corpus by performing collection calculation on the text in the corpus, taking effect on the integrated nodes, and completing the expansion of the corpus to be expanded. The corpus of speech recognition can be automatically expanded and written, and the corpus expansion efficiency and precision are greatly improved.
Drawings
Fig. 1 is a schematic flow chart of a corpus expansion method for speech recognition according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating corpus participles according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a corpus expansion apparatus for speech recognition according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, which is a schematic flow chart of a corpus expansion writing method for speech recognition according to an embodiment of the present invention, the method includes steps S1 to S4;
s1, segmenting words of texts in the collected linguistic data to be expanded and written through a maximum forward matching algorithm, determining the parts of speech of the segmented words in the texts, and determining the segmented words of the pre-marked parts of speech in the segmented words as entities according to a preset context collocation rule;
s2, recognizing the pre-built scene recognition model in the corpus to be expanded and written, and judging the text with two entities in the text as a text in a set;
s3, searching a father node in a pre-established knowledge graph spectrum through set aggregation calculation to establish a child node, and integrating the entity of the text in the set into the knowledge graph;
and S4, synchronizing the newly added nodes in the knowledge graph into a voice recognition model through a protocol generation rule to become a word slot effective as a voice bottom-finding strategy.
In the specific implementation of this embodiment, the data of the corpus to be expanded is received and input, the corpus data to be expanded is subjected to corpus parsing, the corpus to be expanded is segmented, a part of speech of each segmented word is determined, and an entity in the segmented word is determined according to the context collocation rule, which specifically includes:
the method comprises the steps of segmenting words of texts in collected linguistic data to be expanded through a maximum forward matching algorithm, identifying the part of speech of each word after segmentation, and determining words with different parts of speech in the segmentation as entities according to the identified part of speech and the matching sequence of each segmented word in each linguistic data to be expanded and a preset context matching rule.
For example: the corpus to be expanded, namely 'help me open air conditioner', is identified according to the established collocation context rule, and the 'open' can be identified as a verb and the 'air conditioner' can be identified as a participle of a noun. And judging whether the word is an entity or not, combining a preset context collocation rule, and judging that the word air conditioner is an entity due to a V + n structure and n being an entity. And the entities in other contexts can be identified through setting other context collocation rules.
The label of the text in the set represents a type set which can identify the type of the execution scene at present, such as music, air conditioner, sound effect, telephone and the like which are trained in the scene identification model, while the label of the text outside the set represents the type which is not planned and identified in the earlier stage, such as scenes of automatic driving, auxiliary driving and the like, and training corpora do not exist in the set.
And distinguishing two labels of the text in the set and the text outside the set, and specifically judging whether the labels can be identified by the scene identification model and the number of entities in the text.
When the text of the linguistic data to be expanded and written can be identified by the scene identification model and the number of the entities in the text is two, judging the text as an in-set text;
otherwise, judging the text as an out-of-set text;
the two labels of the text in the set and the text outside the set are distinguished without high requirements on the accuracy of the model, the two labels can be used for realizing the two-classification, the text data in the set can identify scenes, the number of entities is 2, and the text in the non-set is judged to be out of the set.
Establishing child nodes in the father nodes retrieved from the pre-established knowledge graph spectrum through set calculation, and integrating the entities of the texts in the set into the knowledge graph to realize the integration of the texts in the set;
synchronizing newly added nodes in the knowledge graph into a voice recognition model through a protocol generation rule to become a word slot effective as a voice bottom-finding strategy; and after the similar text is identified by the subsequent voice identification model, the entity with the word slot in effect triggers the word slot layer in effect to execute the voice bottom-finding strategy, and thus the corresponding voice response is completed.
Determining the text in the corpus by word segmentation analysis and label division of the corpus to be expanded, integrating the text in the corpus into a knowledge graph of the speech recognition corpus by carrying out collection calculation on the text in the corpus, and taking effect on the integrated nodes to finish the expansion of the corpus to be expanded. The corpus of speech recognition can be automatically expanded and written, and the corpus expansion efficiency and precision are greatly improved.
Example two
In another embodiment provided by the present invention, referring to fig. 2, a schematic flow chart of corpus participles provided by the embodiment of the present invention is shown, which includes the following steps:
the word segmentation of the collected texts in the corpus to be expanded and written through the maximum forward matching algorithm specifically includes:
s201, taking a preset character string with the maximum matching character length from front to back in the text in the corpus to be expanded;
s202, judging whether the character string exists in a preset corpus or not;
when not present, step S203 is performed;
if yes, the character string is successfully segmented, and the step S205 is returned;
s203, deleting the last character of the character string when the character string fails to be segmented;
s204, judging whether the length of the character string is 0 or not;
if yes, the step S205 is executed if the word segmentation of the character string with the maximum matching character length fails;
if not, returning to the step S202;
s205, judging whether the length of the remaining characters of the text in the corpus to be expanded and written after the word segmentation is 0;
if yes, ending word segmentation judgment;
if not, the word segmentation is not finished, and step S206 is executed;
s206, judging whether the remaining character length of the text in the corpus to be expanded and written after the word segmentation is not more than the maximum matching character length;
if yes, taking all the remaining characters as the obtained character string, and returning to the step 202;
if not, the character string with the maximum matching character length is taken again from front to back in all the remaining characters, and the step S202 is returned.
When the embodiment is implemented specifically, word segmentation is performed through maximum forward matching, the maximum matching character length is set, a character length text is calculated from the text, and whether a selected character string exists in a corpus is judged;
if not, deleting the last character of the character string, and judging whether the character string exists in the preset corpus again; if not, deleting the last character of the character string, judging whether the character string exists in the preset language database again until the character string exists in the preset language database or the length of the character string is 0;
when the character string exists, the word segmentation of the character string is successful, the character string with the maximum matching character length is continuously obtained after the character string with the successful word segmentation, whether the obtained character string exists in a preset corpus is judged again, and the word segmentation is carried out on the text in the expanded corpus in sequence;
in one embodiment, the maximum matching character length is set to 5, i.e., the maximum length of a word is assumed to be 5.
The words present in the predetermined corpus are: "we", "often", "intentionally" or "opinion", "divergence", "i", "people", "often", "having", "meaning", "seeing";
we divide by maximum forward match "we often intentionally see a divergence".
A first round: and taking a character string 'we often have' according to the maximum matching character length, taking words in a forward direction, and removing the last character of a matching field every time if matching fails.
"we are frequent", matching 5 words in the corpus, no matching, substring length minus 1 becomes "we are frequent".
"We often" match 4 words in the corpus, without matching, becoming "our longitude".
"We pass" by matching 3 words in the corpus, no match, becoming "we".
"We" match 2 words in the corpus, match successfully, output "We", input becomes "often intentionally diverge".
And a second round: take "often intentionally in the remaining text by the maximum matching character length;
"often opinion", matching 5-word words in the corpus, no matching, substring length minus 1 becomes "often intentional".
"often intentional", 4-word words are matched in the corpus, no match, substring length minus 1 becomes "often".
"often", 3-word words are matched in the corpus, no match, substring length minus 1 becomes "often".
"often," there is a match, output "often," input becomes "intentionally disambiguated," in the corpus.
And so on, until the input length is 0, the scanning is terminated.
Finally, the word segmentation obtained each time is recorded, and the result obtained by the forward maximum matching algorithm is as follows: we/often/opinion/divergence.
And accurately completing word segmentation of the linguistic data to be expanded through maximum forward matching.
EXAMPLE III
In another embodiment provided by the present invention, the corpus to be expanded includes text data that cannot be recognized by a preset semantic model library during speech recognition, text data with wrong instruction execution during speech recognition, text data determined as a repeat instruction during speech recognition, and text data determined as negative by a preset emotion analysis model during speech recognition;
the determining the part of speech of the participle in the text specifically comprises:
calculating the first probability P of the word T after word segmentation being divided into nouns T (n)=C T,n M, and a second probability P divided into verbs T (v)=C T,v /M;
When the first probability is not less than the second probability and the first probability is not less than a preset word segmentation probability, determining a word T as a noun;
when the second probability is not less than the first probability and the second probability is not less than a preset word segmentation probability, determining the word T as a verb;
wherein M is the total number of occurrences of the word T in the preset training corpus, and C T,n A first time that the word T in the training corpus is marked as a noun by a preset part-of-speech tagging model, C T,v And marking the word T in the training corpus as a second time of the verb by the part-of-speech tagging model.
In the embodiment, when corpus expansion is performed, a rejection text and an execution error intention of a user are mainly detected and collected, the rejection text is specifically represented as unknown data identified by a semantic model at the cloud, a main characteristic value of the execution error intention is a repeated instruction, an emotion text and the execution are unsuccessful, the repeated instruction is a same context instruction, the emotion text is a text which is analyzed and judged to be negative by an emotion model, and the execution is unsuccessful and is a cloud issued instruction, and a successful execution instruction is not fed back by a local end.
The data which cannot be processed by the original voice recognition is input as the corpus to be expanded and written, the unknown corpus can be automatically and accurately expanded and written, the expanding and writing efficiency is improved, and the resource waste caused by the fact that the corpus which can be recognized is expanded and written again is avoided.
When the part of speech of the participle is determined, the participle mainly comprises a noun and a verb, and other unnecessary words are generally not identified through a corpus, so that the execution efficiency of the model is improved.
Calculating a first probability P that the word T after word segmentation is divided into nouns T (n)=C T,n (ii) M; calculating a second probability P that the word T after word segmentation is divided into verbs T (v)=C T,v (ii) M; m is the total number of occurrences of the word T in the preset training corpus, C T,n A first number of times that the word T in the training corpus is marked as a noun by a preset part-of-speech tagging model, C T,v Marking the word T in the training corpus as a second time of a verb by the part-of-speech tagging model;
when the first probability is not less than the second probability and the first probability is not less than a preset word segmentation probability, determining a word T as a noun;
when the second probability is not less than the first probability and not less than a preset word segmentation probability, determining a word T as a verb;
when the part of speech identified in the corpus includes other words, the part of speech can be determined by calculating the probability of the word and comparing the calculated probability with each other, and using a similar principle.
Through probability calculation, the part of speech of the word can be accurately determined, and subsequent entity judgment and entity integration are facilitated.
Example four
In another embodiment provided by the present invention, the step S3 specifically includes:
performing node retrieval on two entities of the text in the set through set aggregation calculation in the knowledge graph;
and taking the searched node path as a father node as a scene layer or an intention layer of the knowledge graph, taking the other node as a child node as a word slot layer of the knowledge graph, and completing the integration of the entities of the texts in the set.
In the specific implementation of this embodiment, when integrating the text in the set, the text in the set has two entities, where a node of one entity is a scene layer or an intention layer, and a node of the other entity is a word slot layer;
and respectively carrying out node retrieval on the two entities in a preset knowledge graph by adopting collection calculation, and when a node path of one entity is retrieved, taking the retrieved node path as a father node as a scene layer or an intention layer of the knowledge graph, and taking a node of the other entity as a child node of the father node to finish the integration of the entities of the texts in the set.
If the sound effect is adjusted to the cinema mode, the sound effect is adjusted to the node, the cinema mode is adjusted to the node, the sound effect and the cinema mode are searched, and if the sound effect is searched to have an intention layer, the cinema mode is used as a child node to become a word slot layer.
By adopting a collection calculation method for the texts in the set, the integration of the texts in the set can be completed quickly, and the expansion of the texts in the set in the linguistic data to be expanded is realized.
EXAMPLE five
In another embodiment provided by the present invention, the method further comprises:
adding an out-of-set root node in the knowledge graph to the out-of-set text which is not determined as the in-set text in the corpus to be expanded;
traversing the newly added out-of-set root node, and judging whether an entity exists under the root node in the out-of-set text;
if the entity exists, the entity of the text outside the set does not newly add a root node, and other entities in the text outside the set become child nodes;
when the entities in the text in the set do not have root nodes, taking the entity at the front of the weight sequence in the text as a father node and taking the entity at the back as a child node;
the weight sequence is specifically a sequence obtained by counting the occurrence times of all entities in the text outside the set and sorting according to the statistical number.
When the embodiment is implemented specifically, text in the corpus to be expanded is processed to identify text in the corpus, and the other text is used as text outside the corpus;
for entities in the text outside the set, if the same entity appears, the value of the entity is added with 1, and the more the appearance times are, the higher the weight value of the entity is; counting all entities, and sequencing according to the weight values to obtain a weight sequence;
adding an out-of-set root node in the knowledge graph to the out-of-set text which is not determined as the in-set text in the corpus to be expanded;
traversing the newly added out-of-set root node, and judging whether an entity exists under the root node in the out-of-set text;
if the entity exists, the entity of the text outside the set does not newly add a root node, and other entities in the text outside the set become child nodes;
and when the entities of the text in the set do not have the root node, taking the entity at the front of the weight sequence in the text as a father node and taking the entity at the back as a child node to finish the integration of the text out of the set.
And through traversing the newly added root nodes, the integration of the foreign texts is realized.
Example six
In another embodiment provided by the present invention, the step S4 specifically includes:
calling a Chinese-English translation interface, carrying out English translation on the node entity, and packaging into a data structure;
and synchronizing the data structure packed by the text and the protocol field of the text into the voice recognition model for training, enabling the newly added node entity to become a word slot to take effect and become a feature in the voice recognition model, and when similar voice dialogs are recognized, taking the word slot as a parameter of the voice pocket bottom text to perform corresponding instruction control.
In the specific implementation of this embodiment, a newly added word slot in the knowledge graph becomes a child node, and a middle-english translation interface is invoked through a protocol generation rule to perform english translation on a node entity, and the node entity is packaged into a data structure, for example: { { service: "scene level node english interpretation" }, { intent: meaning layer node English translation, slot, word slot layer node English translation, text, node extraction source text, word slot, node entity name.
Synchronizing a data structure packed by a text and a protocol field of the text into a voice recognition model for training, and enabling a newly added node entity to become a word slot to take effect and become a feature in a semantic model;
when similar voice phonetics are recognized, the word slot is used as a parameter of a voice bottom-pocket text, if a user says 'turn on an air conditioner energy-saving mode', no energy-saving mode exists at present, the mechanism establishes an energy-saving mode entity under an 'air conditioner' parent-child point to become a 'air conditioner-energy-saving mode' parent-child node, the parent-child node is synchronized into a recognition model, when the user says 'turn on the air conditioner energy-saving mode', the cloud model is recognized semantically, the bottom-pocket text is made 'the current air conditioner has no energy-saving mode', and the problem that silence cannot be responded is avoided. And after the newly added father node in the knowledge graph exceeds a set threshold value, the newly added father node is automatically pushed to the rear end, and the product personnel configures, develops and changes functions. The function that the high-frequency use of the perception user cannot be realized is solved.
EXAMPLE seven
In another embodiment provided by the present invention, after determining the entity, the method further comprises:
acquiring a search hot word in a first preset period from a search website, acquiring a network topic and a hot word in the first preset period from a mainstream network platform, and taking the acquired search hot word, network topic and hot word as training data;
performing kicking weight processing on the obtained training data through a neural network model, performing service-related judgment on the training data subjected to kicking weight processing, and removing the training data which are not related to the service;
and marking the part of speech of the training data without irrelevant data, and adding the part of speech into the part of speech marking model for training.
In the specific implementation of this embodiment, the identification of the entity depends on part-of-speech tagging and a statistical method, but the identification of new network words, hot networks, and the like lacks part-of-speech tagging, and there are cases where it is difficult to determine or the determination is wrong, so a neural network model detection mechanism is introduced, specifically:
acquiring a daily top50 search hot word from a search website, acquiring a daily network topic and a hot word from a mainstream network platform, and taking the acquired search hot word, network topic and hot word as training data;
performing weight kicking processing on the obtained training data through a neural network model, performing service-related judgment on the training data subjected to weight kicking processing, and outputting whether the judgment is related to the service, wherein 0 is irrelevant to the service, 1 is relevant to the service, and the training data irrelevant to the service are removed;
labeling part-of-speech on the training data without irrelevant data, and adding the part-of-speech into the part-of-speech labeling model for training;
repeatedly acquiring training data according to a training period once a day, kicking weight, removing weight and marking operation, and periodically training the part-of-speech marking model.
Through periodic model training, the accuracy of part-of-speech recognition of the part-of-speech tagging model can be accurately ensured.
Example eight
Referring to fig. 3, which is a schematic structural diagram of a corpus expanding and writing device for speech recognition provided in an embodiment of the present invention, the device includes: the device comprises a parameter acquisition module, a judgment module and a control module;
the parameter acquisition module is used for responding to a starting request input by a user and acquiring the whole vehicle state information of a vehicle end and the environment parameters outside the vehicle;
the judging module is used for generating a starting instruction when the state information of the whole vehicle meets a preset starting condition and the environmental parameter meets a preset safety condition;
and the control module is used for controlling the vehicle end to enter an immersive cabin mode according to the starting instruction.
The corpus expansion and writing device for speech recognition provided in this embodiment can perform all steps and functions of the corpus expansion and writing method for speech recognition provided in any one of the embodiments, and specific functions of the device are not described in detail herein.
Example nine
Fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present invention. The terminal device includes: a processor, a memory, and a computer program, such as a voice recognition corpus expansion program, stored in the memory and executable on the processor. When the processor executes the computer program, the steps in each embodiment of the corpus expansion method for speech recognition, such as steps S1 to S4 shown in fig. 1, are implemented. Alternatively, the processor implements the functions of the modules in the above device embodiments when executing the computer program.
Illustratively, the computer program may be partitioned into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the corpus expansion device for speech recognition. For example, the computer program may be divided into a plurality of modules, and specific functions of each module are described in detail in the corpus expansion method for speech recognition provided in any of the above embodiments, and detailed descriptions of specific functions of the apparatus are omitted here.
The corpus expanding and writing device for voice recognition can be computing equipment such as a desktop computer, a notebook computer, a palm computer and a cloud server. The corpus expansion device for speech recognition may include, but is not limited to, a processor, and a memory. It will be understood by those skilled in the art that the schematic diagram is merely an example of a speech recognition corpus expansion apparatus, and does not constitute a limitation of a speech recognition corpus expansion apparatus, and may include more or less components than those shown, or combine some components, or different components, for example, the speech recognition corpus expansion apparatus may further include an input-output device, a network access device, a bus, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc., the processor is a control center of the speech recognition corpus expansion apparatus, and various interfaces and lines are used to connect various parts of the whole speech recognition corpus expansion apparatus.
The memory may be configured to store the computer program and/or the module, and the processor may implement various functions of the corpus expansion device for speech recognition by operating or executing the computer program and/or the module stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The module integrated with the voice recognition corpus expanding and writing device can be stored in a computer readable storage medium if the module is realized in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims (10)

1. A corpus expansion method for speech recognition is characterized by comprising the following steps:
segmenting the texts in the collected linguistic data to be expanded through a maximum forward matching algorithm, determining the part of speech of the segmented words in the texts, and determining the pre-marked part of speech in the segmented words as an entity according to a preset context collocation rule;
identifying the pre-built scene identification model in the corpus to be expanded and written, and judging the text with two entities in the text as an in-set text;
establishing child nodes in the parent nodes retrieved from the pre-established knowledge graph spectrum through set aggregation calculation, and integrating the entities of the texts in the set into the knowledge graph;
and synchronizing the newly added nodes in the knowledge graph into a voice recognition model through an agreement generation rule to become a word slot to take effect as a voice bottom-finding strategy.
2. The corpus expanding method for speech recognition according to claim 1, wherein said segmenting the collected texts in the corpus to be expanded by the maximum forward matching algorithm specifically comprises:
s201, taking a preset character string with the maximum matching character length from front to back in the text in the corpus to be expanded;
s202, judging whether the character string exists in a preset corpus or not;
when not present, step S203 is performed;
if yes, the character string is successfully segmented, and the step S205 is returned;
s203, deleting the last character of the character string when the character string fails to be segmented;
s204, judging whether the length of the character string is 0;
if yes, the step S205 is executed if the word segmentation of the character string with the maximum matching character length fails;
if not, returning to the step S202;
s205, judging whether the remaining character length of the text in the corpus to be expanded and written after the word segmentation is 0;
if yes, ending word segmentation judgment;
if not, the word segmentation is not finished, and step S206 is executed;
s206, judging whether the remaining character length of the text in the corpus to be expanded and written after the word segmentation is not more than the maximum matching character length;
if yes, taking all the remaining characters as the obtained character string, and returning to the step 202;
if not, in all the remaining characters, the character string with the maximum matching character length is taken again from front to back, and the step S202 is returned.
3. The corpus expanding method for speech recognition according to claim 1, wherein the corpus to be expanded includes text data that cannot be recognized by a preset semantic model library during speech recognition, text data with an instruction execution error during speech recognition, text data that is determined as a repeat instruction during speech recognition, and text data that is determined as negative by a preset emotion analysis model during speech recognition;
the determining the part of speech of the participle in the text specifically comprises:
calculating the first probability P of the word T after word segmentation being divided into nouns T (n)=C T,n M, and a second probability P divided into verbs T (v)=C T,v /M;
When the first probability is not less than the second probability and the first probability is not less than a preset word segmentation probability, determining a word T as a noun;
when the second probability is not less than the first probability and the second probability is not less than a preset word segmentation probability, determining the word T as a verb;
wherein M is the total occurrence frequency of the words T in the preset training corpus, C T,n A first time that the word T in the training corpus is marked as a noun by a preset part-of-speech tagging model, C T,v And marking the word T in the training corpus as a second time of a verb by the part-of-speech tagging model.
4. The corpus expanding method for speech recognition according to claim 1, wherein said building child nodes among the parent nodes retrieved from the pre-built knowledge graph through the clustering calculation, and integrating the entities of the texts in the set into the knowledge graph specifically comprises:
performing node retrieval on two entities of the text in the set through set aggregation calculation in the knowledge graph;
and taking the searched node path as a father node as a scene layer or an intention layer of the knowledge graph, and taking the other node as a child node as a word slot layer of the knowledge graph to finish the integration of the entities of the texts in the set.
5. The corpus expansion method for speech recognition according to claim 1, wherein said method further comprises:
adding an out-of-set root node in the knowledge graph to the out-of-set text which is not determined as the in-set text in the corpus to be expanded;
traversing the newly added out-of-set root node, and judging whether an entity exists in the out-of-set text under the root node;
if the entity exists, the root node is not newly added to the entity of the text outside the set, and other entities in the text outside the set become child nodes;
when the entities in the text in the set do not have root nodes, taking the entity at the front of the weight sequence in the text as a father node and taking the entity at the back as a child node;
the weight sequence is specifically a sequence obtained by counting the occurrence times of all entities in the text outside the set and sorting according to the statistical number.
6. The corpus expanding and writing method for speech recognition according to claim 1, wherein the protocol generation rule synchronizes the newly added nodes in the knowledge graph to the speech recognition model to become effective word slots as a speech bottom-finding policy, specifically comprising:
calling a Chinese-English translation interface, carrying out English translation on the node entity, and packaging into a data structure;
synchronizing the data structure packed by the text and the protocol field of the text into the voice recognition model for training, enabling the newly added node entity to become a word slot to take effect and become a feature in the voice recognition model, and when similar voice dialects are recognized, taking the word slot as a parameter of the voice pocket bottom text to perform corresponding instruction control.
7. The corpus expander method for speech recognition according to claim 3, wherein after determining the entity, the method further comprises:
acquiring a search hot word in a first preset period from a search website, acquiring network topics and hot words in the first preset period from a mainstream network platform, and taking the acquired search hot word, network topics and hot words as training data;
performing weight kicking processing on the obtained training data through a neural network model, performing service-related judgment on the training data subjected to weight kicking processing, and removing the training data which are not related to the service;
and marking the part of speech of the training data without irrelevant data, and adding the part of speech into the part of speech marking model for training.
8. A corpus expanding and writing device for speech recognition, the device comprising:
the analysis module is used for segmenting the texts in the collected linguistic data to be expanded through a maximum forward matching algorithm, determining the part of speech of the segmented words in the texts, and determining the pre-marked part of speech of the segmented words as an entity according to a preset context matching rule;
the recognition module is used for recognizing the pre-built scene recognition model in the corpus to be expanded and written, and the text with two entities in the text is judged as the text in the set;
the first integration module is used for searching a parent node in a pre-established knowledge graph through the clustering calculation to establish a child node, and integrating the entity of the text in the set into the knowledge graph;
and the validation module is used for synchronizing the newly added nodes in the knowledge graph into the voice recognition model through the protocol generation rule to become word slots to be validated as the voice bottom-finding strategy.
9. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor executes the computer program to implement the corpus expansion method for speech recognition according to any one of claims 1 to 7.
10. A computer-readable storage medium, comprising a stored computer program, wherein when the computer program runs, the computer-readable storage medium controls an apparatus to execute the corpus expansion method for speech recognition according to any one of claims 1 to 7.
CN202211367688.8A 2022-11-03 2022-11-03 Corpus expansion writing method, apparatus, device and storage medium for speech recognition Pending CN115757814A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211367688.8A CN115757814A (en) 2022-11-03 2022-11-03 Corpus expansion writing method, apparatus, device and storage medium for speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211367688.8A CN115757814A (en) 2022-11-03 2022-11-03 Corpus expansion writing method, apparatus, device and storage medium for speech recognition

Publications (1)

Publication Number Publication Date
CN115757814A true CN115757814A (en) 2023-03-07

Family

ID=85357128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211367688.8A Pending CN115757814A (en) 2022-11-03 2022-11-03 Corpus expansion writing method, apparatus, device and storage medium for speech recognition

Country Status (1)

Country Link
CN (1) CN115757814A (en)

Similar Documents

Publication Publication Date Title
CN108847241B (en) Method for recognizing conference voice as text, electronic device and storage medium
US10176804B2 (en) Analyzing textual data
CN106776544B (en) Character relation recognition method and device and word segmentation method
CN108304375B (en) Information identification method and equipment, storage medium and terminal thereof
CN106570180B (en) Voice search method and device based on artificial intelligence
CN111444330A (en) Method, device and equipment for extracting short text keywords and storage medium
CN111339250B (en) Mining method for new category labels, electronic equipment and computer readable medium
US20140032207A1 (en) Information Classification Based on Product Recognition
CN108027814B (en) Stop word recognition method and device
CN110853625B (en) Speech recognition model word segmentation training method and system, mobile terminal and storage medium
CN113076720B (en) Long text segmentation method and device, storage medium and electronic device
JP6208794B2 (en) Conversation analyzer, method and computer program
CN111414735A (en) Text data generation method and device
CN110826301B (en) Punctuation mark adding method, punctuation mark adding system, mobile terminal and storage medium
CN114860992A (en) Video title generation method, device, equipment and storage medium
CN113128205B (en) Scenario information processing method and device, electronic equipment and storage medium
CN110825840A (en) Word bank expansion method, device, equipment and storage medium
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN117292688A (en) Control method based on intelligent voice mouse and intelligent voice mouse
CN111046168A (en) Method, apparatus, electronic device, and medium for generating patent summary information
CN110874408A (en) Model training method, text recognition device and computing equipment
CN115757814A (en) Corpus expansion writing method, apparatus, device and storage medium for speech recognition
CN114528851A (en) Reply statement determination method and device, electronic equipment and storage medium
CN111785259A (en) Information processing method and device and electronic equipment
CN114519357B (en) Natural language processing method and system based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination