CN117828081A

CN117828081A - Method, device and storage medium for detecting speech position

Info

Publication number: CN117828081A
Application number: CN202410008601.0A
Authority: CN
Inventors: 罗引; 刘宏宇; 王宇琪; 徐楠; 张西娜; 曹家; 王磊
Original assignee: Beijing Zhongke Wenge Technology Co ltd
Current assignee: Beijing Zhongke Wenge Technology Co ltd
Priority date: 2024-01-03
Filing date: 2024-01-03
Publication date: 2024-04-05

Abstract

The present disclosure relates to the field of natural language processing technologies, and in particular, to a method and apparatus for detecting a speech position, and a storage medium. Comprising the following steps: acquiring a text to be detected, and extracting a figure language from the text to be detected; acquiring a first semantic matrix vector and at least one second semantic matrix vector; the first semantic matrix vector is a semantic representation vector corresponding to the character language, and the second semantic matrix vector is a semantic representation vector corresponding to at least one standing tag; the position label is a position form for a preset position target; calculating target similarity between the first semantic matrix vector and each second semantic matrix vector; and determining the standpoint of the human language for the preset standpoint target by the standpoint label corresponding to the second semantic matrix vector with the maximum target similarity. The embodiment of the application is used for solving the problem that the detection effect of the speech standpoint detection is poor.

Description

Method, device and storage medium for detecting speech position

Technical Field

The present disclosure relates to the field of natural language processing technologies, and in particular, to a method and apparatus for detecting a speech position, and a storage medium.

Background

With the rapid development of the internet, more and more users participate in the discussion of hot topics in channels such as social media websites, news portals and the like. The hot topic is often a hot event, and the discussion angles of different users on the hot topic are often different, so that the expressed viewpoints and positions are also different. The method has the advantages that the language of the person is effectively obtained, the standpoint is studied, and the details of the trending topics can be comprehensively known in a short time.

The speech position detection is to classify the opinion or attitude of a person on an individual, things or event into a category of support, objection or neutrality, and the speech position detection is generally divided into two stages of speech extraction and position detection. On one hand, the current method for extracting the language words usually extracts the trigger words through a constructed trigger dictionary, and extracts the human language words through syntactic analysis, however, the method only depends on the constructed trigger dictionary, so that the extraction precision is lower; meanwhile, the method can also have the problem that the extracted language can not reflect the text subject because only all the language expressed by the same person in the text is extracted. On the other hand, the existing standing detection method mainly adopts a classification method to judge standing expression of a text on a target, however, the method usually only focuses on mining of whole semantic information of the text, and when the deviation between the target text and a training sample is large, the detection effect is often poor.

Disclosure of Invention

In order to solve the technical problems, the application provides a method, a device and a storage medium for detecting the position of a speaker, which can improve the detection effect of the detection of the position of the speaker.

In a first aspect, the present application provides a method for detecting a speech position, including: acquiring a text to be detected, and extracting a figure language from the text to be detected; acquiring a first semantic matrix vector and at least one second semantic matrix vector; the first semantic matrix vector is a semantic representation vector corresponding to the character language, and the second semantic matrix vector is a semantic representation vector corresponding to at least one standing tag; the position label is a position form for a preset position target; calculating target similarity between the first semantic matrix vector and each second semantic matrix vector; and determining the standpoint of the human language for the preset standpoint target by the standpoint label corresponding to the second semantic matrix vector with the maximum target similarity.

In a second aspect, the present application provides a speech position detection apparatus, comprising: the extraction module is used for acquiring the text to be detected and extracting the figure language from the text to be detected; the acquisition module is used for acquiring the first semantic matrix vector and at least one second semantic matrix vector; the first semantic matrix vector is a semantic representation vector corresponding to the character language, and the second semantic matrix vector is a semantic representation vector corresponding to at least one standing tag; the position label is a position form for a preset position target; the computing module is used for computing the target similarity between the first semantic matrix vector and each second semantic matrix vector; and the determining module is used for determining the standpoint of the human talk to the preset standpoint target according to the standpoint label corresponding to the second semantic matrix vector with the maximum target similarity.

In a third aspect, the present application provides an electronic device, including: a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program implementing the method of speech standpoint detection as in the first aspect when executed by the processor.

In a fourth aspect, the present application provides a computer-readable storage medium comprising: a computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements a method for speech position detection as in the first aspect.

In a fifth aspect, the present application provides a computer program product comprising: the computer program product, when run on a computer, causes the computer to implement the method for detecting a speech standpoint as in the first aspect.

Compared with the prior art, the technical scheme provided by the application has the following advantages: firstly, a text to be detected is obtained, and a figure language is extracted from the text to be detected. Then, a first semantic matrix vector and at least one second semantic matrix vector are obtained; the first semantic matrix vector is a semantic representation vector corresponding to the character language, the second semantic matrix vector is a semantic representation vector corresponding to at least one position label, and the target similarity between the first semantic matrix vector and each second semantic matrix vector is calculated. And finally, determining the standpoint of the human talk to the preset standpoint target by the standpoint label corresponding to the second semantic matrix vector with the maximum target similarity. In this way, when determining the position of the figure language for the preset position target, determining the position label with the largest target similarity between the semantic representation vectors corresponding to the figure language and the position label as the final position, namely adding the influence of the semantic of the position label on the final position in the process of detecting the language position, so that the position detection result is more accurate. In addition, the final position is determined according to the target similarity between the semantic representation vector of the figure language and the semantic representation vector of the position label, and the final position can be determined directly according to the size of the target similarity without the problem that the deviation between the target text and the training sample is large, so that the problem that the detection effect is poor due to the fact that the deviation between the target text and the training sample is large is avoided, and the detection effect of high-language position detection is further improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a schematic flow chart of a method for detecting a speaking position according to an embodiment of the present application;

FIG. 2 is a second flow chart of a method for detecting a position of a speaker according to an embodiment of the present disclosure;

FIG. 3 is a third flow chart of a method for detecting a position of a speaker according to an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a method for detecting a speech position according to an embodiment of the present disclosure;

FIG. 5 is a flowchart of a method for detecting a speech position according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a speaking position detecting device according to an embodiment of the present application;

FIG. 7 is a second schematic structural diagram of a speaking position detecting device according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order that the above objects, features and advantages of the present application may be more clearly understood, a further description of the aspects of the present application will be provided below. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the application.

The method for detecting the position of the speaker provided by the embodiment of the application can be executed by a device for detecting the position of the speaker, and the device for detecting the position of the speaker can be hardware or software. When the speech position detection apparatus is hardware, it may be various electronic devices having a function of running speech position detection, including but not limited to a vehicle-mounted device, an intelligent vehicle, a mobile phone, a computer, a tablet computer, a television, a smart television, a laser projection device, a monitor, an electronic whiteboard (electronic bulletin board), an electronic desktop (electronic table), and the like. When the speech position detecting means is software, it can be installed in the above-listed electronic device. Which may be implemented as a plurality of software or software modules, or as a single software or software module. The present invention is not particularly limited herein.

Fig. 1 is a flow chart of a method for detecting a speech position according to an embodiment of the present application, as shown in fig. 1, the method for detecting a speech position may include the following steps:

s11, acquiring a text to be detected, and extracting a figure and a language from the text to be detected.

First, a text to be detected is acquired. Specifically, the manner of acquiring the text to be detected may be to acquire the text to be detected converted from data in other formats into the text format, or may be to directly acquire the text to be detected in the text format.

In some embodiments, the text to be detected in the present application may be various texts such as news articles, dialect texts, and the like.

Next, a person's speech is extracted from the text to be detected.

In some embodiments, as shown in fig. 2, the way to extract the human language from the text to be detected may include the following steps:

s111, acquiring initial trigger words, and constructing a language trigger word dictionary according to the initial trigger words.

In some embodiments, as shown in fig. 3, the manner of acquiring the initial trigger word and constructing the language trigger word dictionary according to the initial trigger word may include the following steps:

s1111, acquiring an initial trigger word.

Specifically, the manner of acquiring the initial trigger word may be an initial trigger word determined according to the preset position target in step S11, for example, the related personnel enumerates the initial trigger word according to the preset position target, the field in which the preset position target is located, and the like; the related professional tools can also be used for finding potential initial trigger words, for example, text analysis tools, keyword extraction tools and the like are used for extracting words with occurrence frequency greater than a frequency threshold value and parts of speech of verbs from a corpus related to a preset position target as initial trigger words. Of course, other ways of obtaining the initial trigger word may be used, which is not limited in this application.

In some embodiments, the number of initial trigger words is less than a first threshold. The first threshold is preset, for example, a default value, or a value set by a related person according to actual conditions, and for example, the first threshold is 20.

Illustratively, acquiring an initial trigger word for a preset position target based on news data, and the corresponding code can be expressed as: feed_word=get_feed (public_data), where feed_word is used to represent an initial trigger word and public_data is used to represent news data. As another example, the initial trigger words obtained include "say", "express", "tell", "indicate", "disclose", "tank", "claim", and the like.

S1112, traversing and searching the paraphrasing words of the initial trigger words from the preset word forest based on the initial trigger words.

Wherein the preset word forest comprises a collection of a plurality of hyponym word groups, such as a Hadamard synonym forest, a knowledgeable net synonym forest, a Wikipedia and the like.

Specifically, through initial trigger words to a preset word forest, traversing a plurality of sets of near-meaning word phrases, and if a certain set of near-meaning word phrases contains current initial trigger words and more than a first number of all words contained in a meaning item are in an established trigger word list, taking all words in the near-meaning word phrase set as near-meaning words of the initial trigger words. Its corresponding code can be expressed as: feed_syncym=search_syncyms (feed_word, syncym_forest), where feed_syncym is used to represent a near-meaning word of an initial trigger word, feed_word is used to represent an initial trigger word, and syncym_forest is used to represent a preset word forest. The first number is greater than a second threshold, and the second threshold is preset, for example, a default value, or a value set by related personnel according to actual conditions, for example, the second threshold is 3 or 4.

S1113, screening at least one target trigger word based on the initial trigger word and the paraphrasing word to obtain a language trigger word dictionary.

Specifically, the mode of screening the target trigger words may be that related personnel manually screen out a second number of initial trigger words and paraphrasing words most related to the preset position target according to actual conditions, and the second number of initial trigger words and paraphrasing words are used as target trigger words. Its corresponding code can be expressed as: the expression_words=filter (feed_word), wherein the expression_words is used to represent target trigger words, the feed_word is used to represent initial trigger words, and the feed_symbol is used to represent near-meaning words of the initial trigger words. The second number is smaller than a third threshold, and the third threshold is preset, for example, a default value, or a value set by related personnel according to actual conditions, for example, the third threshold is 100.

According to the scheme, the initial trigger words are searched through traversal from the preset word forest based on the initial trigger words, and at least one target trigger word is screened based on the initial trigger words and the initial trigger words to obtain the speaking trigger word dictionary. When the language trigger word dictionary is constructed, the initial trigger words can be fully expanded, so that the constructed language trigger word dictionary is more comprehensive. In addition, the number of the target trigger words is controlled below a third threshold value, so that interference caused by too many target trigger words to subsequent figure-speaking extraction can be avoided, and the accuracy of figure-speaking extraction is improved.

S112, analyzing the text to be detected to obtain the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected and the named entity type.

Wherein, the sentence to be detected belongs to the text to be detected.

In some embodiments, as shown in fig. 4, the analysis operation on the text to be detected to obtain the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected, and the named entity type may include the following steps:

s1121, deleting html labels and special characters in the text to be detected, and performing complex-to-simple conversion on the text to be detected to obtain a preprocessed text.

Wherein, the html label at least comprises < p >, < image > and other labels, and the special characters at least comprise [ ], < > and the like.

Specifically, deleting html tags and special characters in the text to be detected, and performing complex-to-simple conversion on the text to be detected to obtain a preprocessed text, wherein the corresponding codes can be expressed as follows: ptext= cht _to_chs (text), where ptext is used to represent the pre-processed text, cht _to_chs (·) is used to represent the conversion of the traditional text to simplified text, and text is used to represent the text to be detected.

S1122, segmenting the preprocessed text according to sentences to obtain at least one sentence to be detected.

Specifically, the preprocessed text is segmented according to the complete sentence identification, and at least one sentence to be detected is obtained. Wherein the complete sentence identification includes at least a period (), a question mark (: sendees=sendence_split (ptext), where sendens is used to represent sentences to be detected, ptext is used to represent pre-processed text, and sendence_split (·) is used to represent sentence slicing.

S1123, analyzing and labeling each sentence to be detected by using a preset tool to obtain the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected and the named entity type.

Specifically, analysis labeling operation is carried out on each sentence to be detected by using a Hadamard open source natural language processing (natural language processing, NLP) tool kit-Chinese language technology platform (python chinese language technology platform, pyLTP) to obtain the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected and the named entity type.

The analysis labeling operation comprises word segmentation operation, part-of-speech labeling operation, named entity identification operation and dependency syntax structure analysis operation. The parts of speech at least includes nouns, verbs, adjectives, etc., the named entities at least include persona entities (e.g., person names), organizational entities (e.g., organizational names), geographic entities (e.g., place names), etc., and the dependency syntax structure at least includes master-to-name relationships, motor-guest relationships, etc.

The corresponding code may be expressed as: words, popags, ner, arcs= ltp (sententces), where words is used to represent words, popags is used to represent parts of speech, ner is used to represent named entities, arcs is used to represent dependency syntax structures, sententcs is used to represent sentences to be detected.

In the scheme, html tags and special characters in the text to be detected can be deleted, the text to be detected is subjected to complex-to-simple conversion to obtain a preprocessed text, and the preprocessed text is segmented according to sentences to obtain at least one sentence to be detected. And finally, analyzing and labeling each sentence to be detected by using a preset tool to obtain the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected and the named entity type, so that the figure language is extracted according to the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected and the named entity type, and the language position detection method is realized.

S113, extracting at least one speaking event from the text to be detected according to the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected and the named entity type.

In some embodiments, the manner of extracting at least one speaker event from the text to be detected according to the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected, and the named entity type may be: and carrying out the following language analysis operation on each statement to be detected in the text to be detected to obtain at least one language event.

The speaker analysis operations include: extracting a speaker trigger word and a center person in each sentence to be detected to obtain a primary speaker event; extracting modifier words of the speaker trigger words, and adding the modifier words into primary speaker events to obtain at least one speaker event; the words trigger word is a verb, and the words trigger word dictionary is hit; the part of speech of the center personage is noun or pronoun, the named entity type is personage entity, and the dependency syntax structure of the speaker trigger word is a main-predicate relation; the modifier word misses the words dictionary of words, and is in a state relationship with the dependency syntax structure of the words.

Firstly, extracting a speaker trigger word and a center person in each sentence to be detected to obtain a primary speaker event.

In some embodiments, the way to extract the speaker trigger words in each statement to be detected may be: when the part of speech of the first word in the target to-be-detected sentence is a verb and the first word hits the speaking trigger word dictionary, determining that the first word is the speaking trigger word in the target to-be-detected sentence. The target sentence to be detected is any one of the sentences to be detected, and the first word is any word of the target sentences to be detected. Its corresponding code can be expressed as: speed_words=filter (words_words), where speed_words is used to represent words of a word, parts is used to represent parts of speech, words is used to represent words, and expression_words is used to represent structured target words of a word, i.e., words of a word.

In some embodiments, the way to extract the center person in each sentence to be detected may be: when the part of speech of the second word in the target to-be-detected sentence is a noun or a pronoun, the named entity type of the second word is a human entity, and the dependency syntax structure of the second word and the speaking trigger word is a main-predicate relation, the second word is determined to be a center person in the target to-be-detected sentence. The second word is any word in the target sentence to be detected, and the second word is different from the first word. Its corresponding code can be expressed as: person=subject_idetify (spech_words, arcs, postags, ner), wherein person is used to represent a center person, spech_words is used to represent a speaker trigger word, arcs is used to represent a dependency syntax structure, postags is used to represent part of speech, and ner is used to represent a named entity.

After the center person and the speaking trigger word in each sentence to be detected are extracted, the center person and the speaking trigger word form a primary speaking event. Its corresponding code can be expressed as: open 1 = get_expressions (sense, person, speech_words), where open 1 is used to represent a primary speech event, sense is used to represent a sentence to be detected, person is used to represent a central person, and speech_words is used to represent a speech trigger word.

And secondly, extracting modifier words of the speaker trigger words.

In some embodiments, the manner of extracting the modifier of the speaker trigger word may be: and when the third word in the target to-be-detected sentence does not hit the word dictionary of the language trigger word, but the third word and the language trigger word have a dependency syntax structure of the relationship in the shape, determining the third word as the modifier of the language trigger word. The third word is any word in the target sentence to be detected, and the third word is different from the first word and the second word. Its corresponding code can be expressed as: quaternier = get_modifiers (words, arcs, words), where quaternier is used to represent modifiers of the speaker trigger words.

And finally, adding the modifier into the primary speaking event to obtain at least one speaking event. Its corresponding code can be expressed as: opiion = modification (opiion 1, person, speech_words, qualifier, arcs), where opiion is used to represent a speech event, opiion 1 is used to represent a primary speech event, person is used to represent a center person, speech_words is used to represent a speech trigger word, qualifier is used to represent a modifier of a speech trigger word, arcs is used to represent a dependency syntax structure.

In the scheme, the speaking trigger words and the center characters in each sentence to be detected are extracted, and a primary speaking event is obtained; and extracting modifier words of the speaker trigger words, adding the modifier words into the primary speaker event to obtain at least one speaker event, namely extracting the speaker event by not only relying on the speaker trigger word dictionary and the center person when extracting the speaker event, but also adding the modifier words, thereby improving the accuracy of the extracted speaker event.

S114, classifying at least one speaker event according to semantic information of the speaker event to obtain a figure speaker list.

The character language list comprises characters and language corresponding to the characters.

Specifically, the manner of classifying at least one speaker event according to semantic information of the speaker event to obtain the speaker list may be to use a classification method to classify at least one speaker event to obtain the speaker list. Its corresponding code can be expressed as: view point=binary_ classify (opinion), where view point is used to represent a view corresponding to a person, and open is used to represent at least one speaking event.

S115, extracting a text theme of the text to be detected, and determining the center language of each character according to the text theme and the character language list.

First, a text topic of a text to be detected is extracted.

Specifically, extracting important words in the text by a keyword extraction method, and splicing the extracted important words to obtain a text theme. The keyword extraction method can extract keywords in the text to be detected, such as word frequency-inverse document frequency (TF-IDF), text ranking (text ranking), implicit dirichlet distribution (latent Dirichlet allocation, LDA), and the like. Its corresponding code can be expressed as: topic=keyword_extract (text), where topic is used to represent the text topic and text is used to represent the text to be detected.

Second, a center speaker for each character is determined based on the text subject and the list of character speakers.

In some embodiments, the manner in which the center speaker of each character is determined from the text subject and the list of character speakers may be: and executing a computing operation on the corresponding language of each character and the text theme, and determining the center language of each character.

Wherein the computing operation comprises: calculating the similarity between each speaker of the target person and the text theme, and determining the speaker with the highest similarity as the center speaker of the target person; the target person is any one of each person.

Specifically, the similarity of each of the utterances of the target person to the text subject may be calculated by a similarity algorithm. For example, the similarity algorithm may be a jaccard algorithm, or a semantic-based similarity calculation method, and finally, the speaker with the highest similarity is determined as the center speaker of the target person. Its corresponding code can be expressed as: speech=max (computer_sim (views) where speech is used to represent a center speaker, topic is used to represent a text topic, and views is used to represent views corresponding to a certain character.

S116, determining the center language of each person as the person language of each person.

In the scheme, when the character language is extracted, the character language list is firstly extracted, and then the text subject of the text to be detected is used for determining the center language of each character from the character language list, so that the character language of each character is obtained, the character language of each character is enabled to be closer to the text subject of the text to be detected, the number of the character language is reduced, and the accuracy of extracting the character language is ensured under the condition that the complexity of extracting the character language is reduced.

S12, acquiring a first semantic matrix vector and at least one second semantic matrix vector.

The first semantic matrix vector is a semantic representation vector corresponding to the character language, and the second semantic matrix vector is a semantic representation vector corresponding to at least one standing tag; the position label is a position form for a preset position target.

In some embodiments, as shown in fig. 5, the manner of obtaining the first semantic matrix vector and the at least one second semantic matrix vector may include the steps of:

s121, acquiring at least one position label.

Where the position tags include support, objection, and neutral.

S122, splicing the character language theory and at least one standing label, and inputting a semantic conversion model to obtain a first semantic matrix vector and at least one second semantic matrix vector.

The semantic conversion model is pre-trained and used for converting characters into semantic matrix vectors corresponding to the characters.

In some embodiments, concatenating the character utterance and the at least one position tag, inputting a semantic conversion model to obtain a first semantic matrix vector and at least one second semantic matrix vector may be according to a first formulaRealizing the method. Wherein, [ CLS ]] ₁ [L] ₁ Support [ L ]] ₂ Objection [ L ]] ₃ Neutral [ CLS ]] ₂ speech[SEP]For presenting the passage of people's speech and position tags through a CLS ]、[L]、[SEP]The separator is spliced, and Ernie represents a semantic conversion model.

For example, the first semantic matrix vector and the at least one second semantic matrix vector may be expressed as:wherein h is _speech For representing a first semantic matrix vector, h _{Support for} For representing semantic matrix vectors corresponding to support tags, h _{Against the object} For representing semantic matrix vectors corresponding to anti-tags, h _{Neutral position} For representing the semantic matrix vector corresponding to the neutral label, i.e. the second semantic matrix vector comprises h _{Support for} ,h _{Against the object} ,h _{Neutral position} 。

In some embodiments, the manner of obtaining the first semantic matrix vector and the at least one second semantic matrix vector may further be to perform semantic matrix vector conversion on the character speaker and the at least one position tag through a matrix conversion algorithm, so as to obtain the first semantic matrix vector and the at least one second semantic matrix vector. Wherein the matrix conversion algorithm or model is capable of converting text into a semantic matrix, e.g., word embedding model, word2Vec algorithm, etc.

In some embodiments, the preset position targets may be various targets that may be voted on by an entity, statement, perspective, topic, etc.

S13, calculating the target similarity between the first semantic matrix vector and each second semantic matrix vector.

Specifically, the method for calculating the target similarity between the first semantic matrix vector and each second semantic matrix vector may be to calculate the cosine similarity between the first semantic matrix vector and each second semantic matrix vector, so as to obtain the target similarity.

For example, it may be according to the second formula prob=cosine_similarity (h _speech H) calculating a target similarity between the first semantic matrix vector and each of the second semantic matrix vectors, wherein h= [ h ] _{Support for} ,h _{Against the object} ,h _{Neutral position} ]，h _{Support for} Semantic representation vector for representing support label correspondence, h _{Against the object} Semantic representation vector for representing the correspondence of anti-tags, h _{Neutral position} The prob is used for representing semantic representation vectors corresponding to neutral labels, and the prob is used for representing a set of target similarities corresponding to a plurality of standing labels.

Further specifically, calculating cosine similarity between the first semantic matrix vector and the semantic representation vector corresponding to the support label to obtain target similarity corresponding to the support label; the cosine similarity between the first semantic matrix vector and the semantic representation vector corresponding to the anti-object label is calculated, and the target similarity corresponding to the anti-object label is obtained; and calculating cosine similarity between the first semantic matrix vector and the semantic representation vector corresponding to the neutral label to obtain target similarity corresponding to the neutral label.

S14, determining the position label corresponding to the second semantic matrix vector with the maximum target similarity as the position of the human talk to the preset position target.

For example, it may be according to a third formula o _t =argmax (softmax (prob)) determining the position label corresponding to the second semantic matrix vector with the maximum target similarity as the position of the human utterance with respect to the preset position target, wherein o _t The position tag index corresponding to the maximum target similarity is used for representing the target similarity, and prob is used for representing the target similarity, and softmax (·) is a normalization function.

In the above scheme, firstly, a text to be detected is obtained, and a figure language is extracted from the text to be detected. Then, a first semantic matrix vector and at least one second semantic matrix vector are obtained; the first semantic matrix vector is a semantic representation vector corresponding to the character language, the second semantic matrix vector is a semantic representation vector corresponding to at least one position label, and the target similarity between the first semantic matrix vector and each second semantic matrix vector is calculated. And finally, determining the standpoint of the human talk to the preset standpoint target by the standpoint label corresponding to the second semantic matrix vector with the maximum target similarity. In this way, when determining the position of the figure language for the preset position target, determining the position label with the largest target similarity between the semantic representation vectors corresponding to the figure language and the position label as the final position, namely adding the influence of the semantic of the position label on the final position in the process of detecting the language position, so that the position detection result is more accurate. In addition, according to the method for determining the final position according to the target similarity between the semantic representation vector of the figure language and the semantic representation vector of the position label, the final position can be determined directly according to the size of the target similarity, and the problem that the deviation between the target text and the training sample is large is avoided, so that the problem that the detection effect is poor due to the fact that the deviation between the target text and the training sample is large is avoided, and the detection effect of the language position detection is further improved.

The embodiment of the application may divide the functional modules of the speech position detection apparatus according to the above method examples, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated in one processing unit. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

As shown in fig. 6, a schematic structural diagram of a speech position detecting device provided in an embodiment of the present application includes an extracting module 61, an obtaining module 62, a calculating module 63, and a determining module 64.

The extracting module 61 is configured to obtain a text to be detected, and extract a figure language from the text to be detected; an acquisition module 62, configured to acquire a first semantic matrix vector and at least one second semantic matrix vector; the first semantic matrix vector is a semantic representation vector corresponding to the character language, and the second semantic matrix vector is a semantic representation vector corresponding to at least one standing tag; the position label is a position form for a preset position target; a calculation module 63, configured to calculate a target similarity between the first semantic matrix vector and each of the second semantic matrix vectors; the determining module 64 is configured to determine a position label corresponding to the second semantic matrix vector with the maximum similarity of the target, as a position of the human speaker with respect to the preset position target.

In some embodiments, as shown in fig. 7, the extraction module 61 includes a construction sub-module 611, an analysis sub-module 612, an extraction sub-module 613, a classification sub-module 614, a determination sub-module 615; a constructing sub-module 611, configured to obtain an initial trigger word, and construct a speaker trigger word dictionary according to the initial trigger word; an analysis sub-module 612, configured to perform an analysis operation on the text to be detected, so as to obtain a dependency syntax structure of each sentence to be detected, a part of speech of each word in each sentence to be detected, and a named entity type; the sentence to be detected belongs to the text to be detected; an extracting sub-module 613, configured to extract at least one speaker event from the text to be detected according to the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected, and the named entity type; the classifying sub-module 614 is configured to classify at least one speaker event according to semantic information of each speaker event to obtain a person speaker list; the character language list comprises characters and language corresponding to the characters; a determining submodule 615, configured to extract a text topic of a text to be detected, and determine a center speaker of each person according to the text topic and the person speaker list; the center speaker of each person is determined as the person speaker of each person.

In some embodiments, the sub-module 611 is configured, in particular, to: acquiring an initial trigger word; traversing and searching the hyponyms of the initial trigger words from the preset word forest based on the initial trigger words; and screening at least one target trigger word based on the initial trigger word and the paraphrasing words to obtain a language trigger word dictionary.

In some embodiments, the analysis submodule 612 is specifically configured to: deleting html labels and special characters in the text to be detected, and performing complex conversion on the text to be detected to obtain a preprocessed text; dividing the preprocessed text according to sentences to obtain at least one sentence to be detected; analyzing and labeling each sentence to be detected by using a preset tool to obtain the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected and the named entity type.

In some embodiments, the extraction sub-module 613 is specifically configured to: each statement to be detected in the text to be detected is subjected to the following language analysis operation to obtain at least one language event; the speaker analysis operations include: extracting a speaker trigger word and a center person in each sentence to be detected to obtain a primary speaker event; extracting modifier words of the speaker trigger words, and adding the modifier words into primary speaker events to obtain at least one speaker event; the words trigger word is a verb, and the words trigger word dictionary is hit; the part of speech of the center personage is noun or pronoun, the named entity type is personage entity, and the dependency syntax structure of the speaker trigger word is a main-predicate relation; the modifier word misses the words dictionary of words, and is in a state relationship with the dependency syntax structure of the words.

In some embodiments, the determination sub-module 615 is specifically configured to: performing calculation operation on the corresponding language theory and text theme of each character, and determining the center language theory of each character; the computing operation includes: calculating the similarity between each speaker of the target person and the text theme, and determining the speaker with the highest similarity as the center speaker of the target person; the target person is any one of each person.

In some embodiments, the acquisition module 62 is specifically configured to: acquiring at least one position tag; the position tags include support, objection and neutral; splicing the figure language theory and at least one standing label, and inputting a semantic conversion model to obtain a first semantic matrix vector and at least one second semantic matrix vector; the semantic conversion model is used for converting the characters into semantic matrix vectors corresponding to the characters.

In some embodiments, the calculating module 63 is specifically configured to determine the cosine similarity between the first semantic matrix vector and each of the second semantic matrix vectors as the target similarity.

The speaking position detection device provided in this embodiment may execute the speaking position detection method provided in the above method embodiment, and its implementation principle and technical effects are similar to those of the above method, and are not repeated here.

As shown in fig. 8, an embodiment of the present application provides an electronic device, including: a processor 801, a memory 802, and a computer program stored on the memory 802 and executable on the processor 801, which when executed by the processor 801, implements the respective procedures of the speech position detection method in the above method embodiments. And the same technical effects can be achieved, and in order to avoid repetition, the description is omitted here.

The embodiment of the present application provides a computer readable storage medium, which is characterized in that a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the computer program realizes each process of the method for detecting the position of a speaker in the embodiment of the method, and can achieve the same technical effect, so that repetition is avoided, and no detailed description is given here.

The computer readable storage medium may be a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, an optical disk, or the like.

The embodiment of the application provides a computer program product, which stores a computer program, and when the computer program is executed by a processor, the computer program realizes each process of the method for detecting the position of a speaker in the embodiment of the method, and can achieve the same technical effect, so that repetition is avoided, and no description is repeated here.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. And the present application may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.

In this application, the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital data processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In the present application, the memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory in a computer readable medium, such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

In the present application, computer-readable media include both non-transitory and non-transitory, removable and non-removable storage media. Storage media may embody any method or technology for storage of information, which may be computer readable instructions, data structures, program modules, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data and carrier waves.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. While the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is merely a specific embodiment of the application to enable one skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for detecting a speech position, comprising:

acquiring a text to be detected, and extracting a figure language from the text to be detected;

acquiring a first semantic matrix vector and at least one second semantic matrix vector; the first semantic matrix vector is a semantic representation vector corresponding to the character language, and the second semantic matrix vector is a semantic representation vector corresponding to at least one position label; the position label is in a position form for a preset position target;

calculating target similarity between the first semantic matrix vector and each second semantic matrix vector;

And determining a position label corresponding to the second semantic matrix vector with the maximum target similarity as a position of the character language for the preset position target.

2. The method of claim 1, wherein extracting the human speech from the text to be detected comprises:

acquiring an initial trigger word, and constructing a speaker trigger word dictionary according to the initial trigger word;

analyzing the text to be detected to obtain the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected and the named entity type; the sentence to be detected belongs to the text to be detected;

extracting at least one speaking event from the text to be detected according to the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected and the named entity type;

classifying the at least one speech event according to the semantic information of the speech event to obtain a figure speech list; wherein, the figure language list comprises figures and language corresponding to the figures;

extracting a text theme of the text to be detected, and determining a center language of each character according to the text theme and the character language list;

The center speaker of each person is determined as the person speaker of each person.

3. The method for detecting a speech position according to claim 2, wherein the steps of obtaining an initial trigger word and constructing a speech trigger word dictionary from the initial trigger word include:

acquiring an initial trigger word;

traversing and searching a paraphrasing word of the initial trigger word from a preset word forest based on the initial trigger word;

and screening at least one target trigger word based on the initial trigger word and the paraphrasing word to obtain a language trigger word dictionary.

4. The method for detecting the position of a speaker according to claim 2, wherein the analyzing the text to be detected to obtain the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected, and the named entity type includes:

deleting html labels and special characters in the text to be detected, and performing complex-to-simple conversion on the text to be detected to obtain a preprocessed text;

dividing the preprocessed text according to sentences to obtain at least one sentence to be detected;

and analyzing and labeling each sentence to be detected by using a preset tool to obtain the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected and the named entity type.

5. The method for detecting the position of a speaker according to claim 2, wherein extracting at least one speaker event from the text to be detected according to the dependency syntax structure of each sentence to be detected, the part of speech of each word in each sentence to be detected, and the named entity type, comprises:

each statement to be detected in the text to be detected is subjected to the following language analysis operation to obtain at least one language event;

the speaker analysis operation includes: extracting a speaker trigger word and a center person in each sentence to be detected to obtain a primary speaker event; extracting modifier words of the language trigger words, and adding the modifier words into the primary language event to obtain at least one language event; the words trigger word is a verb and hits the words trigger word dictionary; the part of speech of the center character is noun or pronoun, the named entity type is character entity, and the dependency syntax structure of the language trigger word is a main-name relation; the modifier is missed in the words trigger word dictionary, and the dependency syntax structure of the modifier and the words trigger word is in a middle-state relation.

6. The method of claim 2, wherein the determining a center speaker for each character from the text subject and the list of character speakers comprises:

Performing calculation operation on the corresponding language theory and text theme of each character, and determining the center language theory of each character;

the computing operation includes:

calculating the similarity between each speaker of the target person and the text subject, and determining the speaker with the highest similarity as the center speaker of the target person; the target person is any one of the each person.

7. The method of claim 1, wherein the obtaining the first semantic matrix vector and the at least one second semantic matrix vector comprises:

acquiring at least one position tag; the position tag includes support, objection, and neutral;

splicing the character language theory and the at least one position label, and inputting the semantic conversion model to obtain a first semantic matrix vector and at least one second semantic matrix vector; the semantic conversion model is used for converting characters into semantic matrix vectors corresponding to the characters.

8. The method of claim 1, wherein said calculating a target similarity between the first semantic matrix vector and each of the second semantic matrix vectors comprises:

And determining cosine similarity between the first semantic matrix vector and each second semantic matrix vector as target similarity.

9. A speech position detection apparatus, comprising:

the extraction module is used for acquiring a text to be detected and extracting a figure language from the text to be detected;

the acquisition module is used for acquiring the first semantic matrix vector and at least one second semantic matrix vector; the first semantic matrix vector is a semantic representation vector corresponding to the character language, and the second semantic matrix vector is a semantic representation vector corresponding to at least one position label; the position label is in a position form for a preset position target;

the calculating module is used for calculating the target similarity between the first semantic matrix vector and each second semantic matrix vector;

and the determining module is used for determining the position label corresponding to the second semantic matrix vector with the maximum target similarity as the position of the figure language for the preset position target.

10. An electronic device, comprising: a processor, a memory and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the method of claim 1 to 8.

11. A computer-readable storage medium, comprising: the computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for detecting the position of a utterance as claimed in any one of claims 1 to 8.