CN116361416A

CN116361416A - Speech retrieval method, system and medium based on semantic analysis and high-dimensional modeling

Info

Publication number: CN116361416A
Application number: CN202310131020.1A
Authority: CN
Inventors: 陈开冉; 黎展; 陶峰; 卢运福; 黄平东
Original assignee: Guangzhou Tungee Technology Co ltd
Current assignee: Guangzhou Tungee Technology Co ltd
Priority date: 2023-02-17
Filing date: 2023-02-17
Publication date: 2023-06-30

Abstract

The invention discloses a voice retrieval method, a system and a medium based on semantic analysis and high-dimensional modeling, wherein the method comprises the following steps: acquiring voice text data; synchronizing text data constructing a semantic tag and defining a data model into a search database through data synchronization; determining search text and screening condition text data to perform high-screen processing; and executing text recall, and generating a final recall candidate set after sequencing and filtering. The invention can integrate the respective characteristics of the three recall modes and improve the accuracy of text recall as much as possible. The method introduces the retrieval based on the multidimensional semantic tag and the text length, realizes the recall based on the context information, solves the problem that the context information cannot be considered in the text recall method, ensures the retrieval accuracy and improves the retrieval efficiency.

Description

Speech retrieval method, system and medium based on semantic analysis and high-dimensional modeling

Technical Field

The invention relates to the technical field of computers, in particular to a voice retrieval method, a voice retrieval system and a voice retrieval medium based on semantic analysis and high-dimensional modeling.

Background

With the development of natural language processing, text retrieval scenes are more and more, and the text retrieval scenes are based on the information of inputting text by a user, analyzing the intention of the user, analyzing the entity in the input text, referring to words and the like, and further retrieving response information. Or matching the existing documents by a text matching FAQ method and the like, so that the documents required by the user are displayed. The main goal of text recall is to identify the user's actual search intent based on the search text entered by the user and return the relevant results. The higher the relevance of the returned results to the search text, the better the text recall effect is indicated. The current text recall technology has certain defects:

(1) By analyzing customer intention, analyzing entity and other modes, the intention of the induction business needs to be summarized, each existing document is filed and classified, the workload is large, and when the intention is newly added and the intention is modified, the intention needs to be re-carded, and the development and modification are difficult.

(2) By means of text matching, by means of matching existing problems, although the time to add and modify existing knowledge is reduced. However, in the multi-round dialogue, the information of the previous sentence and the next sentence is still not considered, such as how much the previous sentence is asked by the client, how much the line is done, the answer is sold, then how much the line is done by the client, and the context is needed to be combined to complement the logic information, but the context information cannot be considered by the existing voice retrieval method based on semantic analysis and high-dimensional modeling.

(3) The knowledge graph-based method requires a great deal of effort to maintain knowledge of the entire graph.

(4) The call data of the existing scheme is often only basic call data, the data dimension is simple, the complex query analysis requirement cannot be supported, and rich semantic label definition is not carried out on each communication session or each sentence. Therefore, the basic data is not good, and the query effect is natural and not too good.

(5) When the existing scheme is used for data synchronization, the code for data synchronization is often directly inserted into the service code, so that not only can the service logic code be polluted, but also the potential risk exists, and one party has a problem and the other party is involved.

(6) The existing scheme lacks support for multi-dimensional high-screen condition query with keywords and key sentences, and does not have scheme design for carrying out different data model queries according to different query conditions.

(7) Aiming at text query, the existing scheme has no word and sentence association and synonym expansion functions, and only simple text matching or fuzzy search is carried out on input text, so that the search result has limitation, the recall rate is low, and the precision is not enough. Moreover, the existing scheme is often single-path search, and has insufficient concurrent performance, and no pre-calculation search and cache, so that the performance is lower, and the method is insufficient for supporting data search of tens of millions or even more than hundreds of millions.

Disclosure of Invention

To solve at least one of the above technical problems, a first aspect of the present invention discloses a speech retrieval method based on semantic analysis and high-dimensional modeling, the method comprising:

obtaining voice data and translating the voice data into text data;

constructing a basic database, and storing text data in the basic database;

constructing a search database, and establishing data synchronization between a basic database and the search database;

the data synchronization process is used for constructing multidimensional semantic tags for text data according to service requirements, defining a text data model based on an application scene of the text, and creating screening conditions of the text data through the semantic tags and the text data model;

determining search text and screening conditions;

performing high-screen processing on text data in a search database based on the screening conditions, and performing text recall based on the text data subjected to the high-screen processing to obtain a first recall candidate set corresponding to the search text;

text recall is carried out on the search text based on semantic embedding of the search text, and a second recall candidate set corresponding to the search text is obtained;

text recall is carried out on the search text based on the context embedding of the search text, and a third recall candidate set corresponding to the search text is obtained;

Sorting and filtering the first recall candidate set, the second recall candidate set and the third recall candidate set based on the keywords of the search text;

and determining a final recall candidate set corresponding to the search text based on the first recall candidate set, the second recall candidate set and the third recall candidate set after the sorting processing and the filtering processing.

In an alternative embodiment, recall of the second and third recall candidate sets is performed by a dual-tower model, wherein the dual-tower model includes a search text tower and a document tower;

when the second recall candidate set is recalled, the characterization models of the search text tower and the document tower are of the same structure and share weight;

when recalling the third recall candidate set, the document tower is used for converting a plurality of inputs of the characterization model into a single result output through BiLSTM, and the characterization models of the search text tower and the document tower are of different structures and have no shared weight.

In an optional embodiment, the performing text recall based on the text data after the high-screen processing to obtain a first recall candidate set corresponding to the search text includes:

Inquiring a preset database according to the search text to obtain an alternative set of the search text;

and screening the candidate set based on a BM25 algorithm to obtain a first recall candidate set corresponding to the search text.

In an alternative embodiment, the sorting process includes:

extracting keywords in the search text through TextRank, performing semantic embedding on the keywords to the first recall candidate set, and then merging the keywords into the second recall candidate set and the third recall candidate set.

In an alternative embodiment, the establishing the data synchronization of the base database and the search database includes:

monitoring operation events of the basic database, wherein the operation events comprise adding a database table, updating the database table and deleting the database table;

if an operation event of the basic database is monitored, executing data synchronization through middleware, wherein the middleware is used for completing data synchronization of the basic database and the difference structure data in the search database;

and performing text data mapping according to the field names and types of the target database tables, and synchronizing the data content of the basic database into the database tables of the corresponding retrieval database.

In an alternative embodiment, the building the multi-dimensional semantic tag on the text data according to the service requirement includes:

and creating a corresponding semantic tag for each group of call data through a preset label creating model, and classifying and storing each group of call data according to the semantic tag corresponding to each group of call data.

In an alternative embodiment, the text-based application scenario defines a text data model, comprising:

defining a data model hierarchy according to the application scene of the text, wherein the data model hierarchy comprises a text-level data model, a paragraph-level data model, a sentence-level data model and a session-level data model, the application scene of the text is represented by the hierarchical text data model, and the semantic tags are stored in the hierarchical text data model as semantic representation vectors.

In an alternative embodiment, the step of performing high-screening processing on the text data in the search database based on the screening condition includes:

when semantic searching is carried out on the database based on the search text, carrying out synonym expansion and/or sentence association on the search text;

invoking a preset algorithm model to judge words and sentences of the search text;

If the search condition is keyword search, searching results from the data model through phrase matching by using the multi-dimensional high-screen query condition and the keywords and the sorting condition;

if the search condition is multi-dimensional high-screen condition query with key sentences, starting two paths of concurrent queries to quickly search data;

the first path of concurrent query is that the first K pieces of data with highest matching degree are directly searched out from the data model through a multi-dimensional high-screen query condition and a key sentence through a BM25 correlation scoring algorithm; and the second path of concurrent inquiry firstly needs to call an algorithm model to calculate the semantic characterization vector of the sentence, and then the semantic characterization vector and the multi-dimensional high-screen inquiry condition are taken to retrieve the first K pieces of data from the sentence-level data model through an ANN algorithm.

The query result is subjected to multi-path merging and sorting to obtain M pieces of data, top N pieces of data are selected, unmatched data in the N pieces of data are eliminated, and matched result data are returned and output.

The second aspect of the invention discloses a voice retrieval system based on semantic analysis and high-dimensional modeling, the system comprises:

the data acquisition module is used for acquiring voice data and translating the voice data into text data;

the basic database module is used for constructing a basic database and storing text data in the basic database;

The data synchronization module is used for constructing a search database, establishing data synchronization between a basic database and the search database, constructing multidimensional semantic tags for text data according to service requirements, and creating screening conditions of the text data through the semantic tags and the text data model based on a text data model defined by an application scene of the text;

the text screening module is used for determining search texts and screening conditions, and performing high-screening processing on text data in a search database based on the screening conditions;

the first recall module is used for carrying out text recall based on the text data after the high-screen processing to obtain a first recall candidate set corresponding to the search text;

the second recall module is used for carrying out text recall on the search text based on semantic embedding of the search text to obtain a second recall candidate set corresponding to the search text;

the third recall module is used for carrying out text recall on the search text based on the context embedding of the search text, and obtaining a third recall candidate set corresponding to the search text;

the processing module is used for carrying out sorting processing and filtering processing on the first recall candidate set, the second recall candidate set and the third recall candidate set based on the keywords of the search text;

And the determining module is used for determining a final recall candidate set corresponding to the search text based on the first recall candidate set, the second recall candidate set and the third recall candidate set after the sorting processing and the filtering processing.

A third aspect of the present invention discloses a computer storage medium storing computer instructions for performing part or all of the steps of the semantic analysis and high-dimensional modeling based speech retrieval method disclosed in the first aspect of the present invention when the computer instructions are invoked.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, text data translated by voice call data are firstly obtained and stored in a basic database, a retrieval database is built again, the basic database is established to be in data synchronization with the retrieval database, screening conditions for text data retrieval are generated through multidimensional semantic tags and text data models in the data synchronization process, after a user determines search text and screening conditions required in retrieval, corresponding text data in the retrieval database are screened through high-screen processing, and then text recall based on statistical data, text recall based on semantics and text recall based on context association are respectively carried out to generate recall candidate sets required by the user, and finally final recall candidate sets corresponding to the search text and the screening conditions are obtained through sorting processing and filtering processing.

According to the embodiment, recall based on semantics can be achieved, the quality of text recall can be improved, recall based on context embedding of search text can be achieved, the problem that existing search cannot consider context information is solved, the respective characteristics of three recall modes can be integrated, and the accuracy of text recall is improved as much as possible under the condition that the performance of text recall is ensured. In addition, the high-level screening design based on the multi-dimensional semantic tags and the text data model can realize rapid and accurate voice retrieval of data volume above hundred million levels, meanwhile, multiple paths of concurrent retrieval, multiple paths of result merging and sorting, algorithm precise filtering, pre-search calculation caching and the like are introduced, and the retrieval efficiency is greatly improved while the retrieval accuracy is ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow diagram of a speech retrieval method based on semantic analysis and high-dimensional modeling according to an embodiment of the present invention;

FIG. 2 is a flow chart of a semantic high-screen query disclosed by an embodiment of the present invention;

FIG. 3 is a flow chart of a pre-computed asynchronous task disclosed in an embodiment of the present invention;

FIG. 4 is a flow chart of loading next page data as disclosed in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a speech retrieval system based on conversational semantic analysis and high-dimensional modeling according to an embodiment of the invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or article that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or article.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The invention discloses a voice retrieval method based on semantic analysis and high-dimensional modeling, which comprises the steps of firstly obtaining text data translated by voice call data, storing the text data in a newly built basic database, then establishing a retrieval database, establishing data synchronization between the basic database and the retrieval database, further establishing a multi-dimensional semantic tag and a text data model in a data synchronization process, generating screening conditions for text data retrieval through the semantic tag and the text data model during retrieval, screening corresponding text data in the retrieval database through high-screen processing after a user determines search text and the screening conditions required during retrieval, respectively generating a recall candidate set required by the user through text recall based on statistical data, text recall based on semantic and text recall based on context association, and finally obtaining a final recall candidate set corresponding to the search text and the screening conditions through sequencing processing and filtering processing.

The method can realize recall based on semantics, can improve the quality of text recall, can realize recall based on context embedding of search text, can solve the problem that the existing search cannot consider the context information, can synthesize the respective characteristics of three recall modes, and can improve the accuracy of text recall as much as possible under the condition of ensuring the performance of text recall.

Example 1

Referring to fig. 1, fig. 1 is a schematic flow chart of a voice retrieval method based on semantic analysis and high-dimensional modeling according to an embodiment of the present invention. As shown in fig. 1, the voice retrieval method based on semantic analysis and high-dimensional modeling may include the following operations:

101. voice data is obtained and translated into text data.

In this embodiment, the voice data may be from a third party system, such as a calling system, a customer service center, etc., and the process of translating the voice data into text data may be implemented by a translation program or a translation script.

102. A base database is constructed and text data is stored in the base database.

In this embodiment, the base database may be a database capable of storing text data, for example, a MongoDB database, a Mysql database, and the like, and when storing text data, the base database may be accessed into a third party system, and the storage of the text data may be completed by an importing manner.

103. Constructing a search database, and establishing data synchronization between a basic database and the search database;

in this embodiment, the search database may be an elastic search database capable of full-text search.

and performing text data mapping according to the field names and types of the target database tables, and synchronizing the data content of the basic database into the database tables of the corresponding retrieval database. Thereby providing data support for the following high screen design.

Specifically, heterogeneous data is synchronized, after call data enters the system from a third party system such as a calling system, call basic data is firstly stored in a MongoDB or Mysql database, and after data storage is completed, the data is required to be synchronized to an elastic search database in full text, so that data support is provided for the following high-screen design. The data synchronization is performed through the self-grinding middleware, so that the data synchronization method is excellent in performance, stable and reliable. The principle is that the data synchronization is triggered by monitoring the operation event of the MongoDB database, when the database table is found to have operations such as new addition, update, deletion and the like, and the data content of the MongoDB database table is automatically synchronized into the corresponding elastic search database table by the self definition and the field name and type mapping of the target database table.

104. And in the data synchronization process, a multidimensional semantic tag is constructed for the text data according to service requirements, a text data model is defined based on an application scene of the text, and screening conditions of the text data are created through the semantic tag and the text data model.

105. And determining search text and screening conditions.

In this embodiment, before searching, the user may customize the filtering conditions and the searching questions according to the required searching contents, generate the searching text from the searching questions, where the searching text includes the searching keywords, and the searching database completes the voice searching through the inputted filtering conditions and the searching text, so as to obtain the voice text information required by the user.

106. Performing high-screen processing on text data in a search database based on the screening conditions, and performing text recall based on the text data subjected to the high-screen processing to obtain a first recall candidate set corresponding to the search text;

The recall of the first way can be understood as a recall based on statistical information. Recall is to quickly obtain an answer that matches the user's query (i.e., search text). The accuracy of the recall based on statistics may not be high, but its performance is relatively good. The BM25 algorithm can be selected for recall later.

Specifically, all candidate words can be imported into an elastic search database, statistical information of all documents is automatically calculated, after a query of a user is obtained, the query is used for querying the database through a match grammar, and top k candidate words with the highest score with bm25 of the query are obtained as candidate sets, so that a first recall candidate set is formed.

107. And carrying out text recall on the search text based on semantic embedding of the search text, and obtaining a second recall candidate set corresponding to the search text.

The second way of recall can be understood as a sentence embedding recall based on contrast learning. The recall method based on statistical information in the first mode cannot understand the semantics of sentences, so that the recall method based on semantics needs to be adopted, namely the recall in the second mode. The invention uses the unsupervised portion of esimsce to train the double-tower model for recall.

Specifically, one piece of data can be randomly selected from all documents to serve as an Anchor, the other piece of data is randomly selected to serve as a negative sample, and a word is randomly repeated for the Anchor to serve as a positive sample. In this way, triplet data was constructed, roBerta was chosen as a backup, and training was performed using contrast loss. After training, recall is performed by means of vector retrieval.

108. And carrying out text recall on the search text based on the context embedding of the search text, and obtaining a third recall candidate set corresponding to the search text.

The third way of recall can be understood as a context-based recall based on contrast learning. The query of the user sometimes contains a plurality of questions or a question method of rich information, such as "how much the accuracy of the hardware line of dialing the Beijing number is" for your, but in the real ticket, each key information is not distributed in the same sentence and can be distributed in the context, so that the sentence-level-based ebedding is difficult to match, that is, the recall effect of the second mode is not good in this case.

After learning the expression of each sentence in step 107, constructing a query with rich information and a corresponding document answer, continuously training a heterogeneous double-tower model on the basis of step 107 through supervised comparison learning, wherein parameters of a query tower (i.e. a text searching tower) and parameters of a document tower are not shared, inputting the current sentence and a context sentence into the document tower, all sentences are processed through BiLSTM, and BiLSTM expression where the current sentence is located is selected through a full connection layer, namely the text combining expression of the current sentence. The subsequent steps are the same as step 107.

109. And sorting and filtering the first recall candidate set, the second recall candidate set and the third recall candidate set based on the keywords of the search text.

In an alternative embodiment, the sorting process includes:

When the ranking order is performed, the ranking order may be ranking based on the attention of the keywords. Keywords in the query and the document are extracted through a TextRank method, then the keywords are attached to an original question method, word-level ebedding is additionally carried out on the additional keywords, and the keywords are combined to the original sentence-level and context-level ebedding through a pooling mode. After the first recall candidate set, the second recall candidate set, and the third recall candidate set are ranked in order, a candidate set within a certain range of the ranking (e.g., top 100, top 10) may be selected as the final recall candidate set. Some candidate sets may also be filtered out according to the ranking.

The system defines a session-level tagged data model transformation and a Sentence-level tagged data model sendence based on an elastic search database with semantic vector search capabilities. The Conversation-level data model conversion comprises dimension data calculated by a rating model and a label model besides basic Conversation data dimension, and can quickly retrieve data from the conversion and return results according to the high-performance query capability of an elastic search database when multi-dimensional high-screen query without key sentences or key words is performed; the Sentence-level data model Sentence adopts a multi-label dimension design as well, and the dimension data of a part of Conversation-level data model conversion are redundant, in addition, semantic vector dimensions are stored, and the result can be accurately and rapidly returned when multi-dimensional high-screen query with key sentences is carried out. Semantic vectors of sentences can be obtained by invoking algorithmic service calculations, which semantic token vectors are stored in a Sentence-level data model Sentence.

Specifically, by creating a custom tag model, rich semantic tags can be built for each conversation or each sentence. Each label model establishes the condition content to be monitored, the application range and the multi-dimensional high-screen condition group. For example, the monitoring of keywords or key sentences on the call text can construct a business attribute label of the call session, the emotion recognition of the calling and called voices can construct an emotion label, the calculation of speaking speed can construct a label of speaking speed, the detection of voice endpoints can construct a label of whether cold fields are constructed, and the matching screening of multiple dimension data can construct a label of call rating. And by analogy, rich semantic tags meeting the service requirements of the user can be constructed through the custom tag model. The process of constructing the tag necessarily involves a large amount of data calculation, and in order to improve the efficiency of screening data and tag calculation, a set of data storage schemes capable of rapidly screening data to perform tag calculation are designed. The core of the method is to classify and store the data, and the method is used for replacing the real-time inquiry of the database and reducing the influence on a service system. The principle of the scheme is illustrated by taking the conversation time and the conversation time as condition screening data to perform rating calculation. Firstly, taking enterprise id to which data belong as a top-level catalog, taking conversation time (generally date) as a lower-level catalog, and storing conversation text and some dimension data which need to be used together in a file. The file is named with a call duration range, for example, 0-15.Txt, call data indicating a call duration of 0-15s are stored in the file 0-15.Txt, and dimension data are separated by \t. The complete directory structure is as follows: company_id/20221201/0-15.Txt. txt file content such as: convergence_id\tsentence_id\t text content\t called region.

When the rating calculation of call data is carried out, firstly, the top-layer file directory where the data is located is positioned according to the enterprise id to which the data belongs, then, according to the call time range to be screened, one or more directories which contain the screened call time range under the directory can be positioned, then, the data files which are in accordance with the call time duration under the directories are searched one by one, the file content is read out in an iterative mode, and then, the rating label calculation is carried out on each piece of data. In order to increase the calculation speed, the searched files are divided in batches, so that the calculation can be performed at the same time.

FIG. 2 is a flow chart of a semantic high-screen query disclosed by an embodiment of the present invention, FIG. 3 is a flow chart of a pre-computed asynchronous task disclosed by an embodiment of the present invention, and FIG. 4 is a flow chart of loading next page data disclosed by an embodiment of the present invention.

As shown in fig. 2, 3 and 4, in an alternative embodiment, the method further comprises:

When a user inputs a keyword or a keyword and a multi-dimensional high-screen query condition to perform semantic search, as shown in fig. 2, the system firstly associates a similar sentence or a keyword to expand a synonym according to the keyword, then invokes an algorithm model to perform word and sentence judgment, and if the keyword search is performed simply, the result is directly searched from a session-level data model conversion through the multi-dimensional high-screen query condition, the keyword and a sorting condition through the phrase matching match_pattern, and then top k data are selected to be directly returned to the user. If the search criteria is a multi-dimensional high-screen condition query with key sentences, the data is quickly retrieved by two-way concurrence. The principle is that one path is connected with multi-dimension high-screen query conditions and the front K pieces of data with highest matching degree are searched out from a Sentence-level data model Sentence through a BM25 relativity scoring algorithm; and the other path of concurrency is that firstly, an algorithm model is required to be called to calculate the semantic characterization vector of the Sentence, and then the characterization vector and the multi-dimensional high-screen query condition are taken to retrieve the first K pieces of data from the Sentence-level data model Sentence through an ANN algorithm. After the corresponding data are searched out by the two paths of concurrency, M pieces of data are obtained through merging and sorting, top N pieces of data are selected, an algorithm fine-ranking model is called to remove unmatched data in the N pieces of data, and the matched data are directly returned to a user. In addition, to speed up the search speed of the next page, asynchronous pre-calculation and caching techniques are introduced. As shown in fig. 3, the principle is that after the M pieces of data in the merge sort mentioned above are taken out of N pieces of data, the remaining pieces of data are placed in the cache C1, and then the pre-calculation asynchronous task is started. The asynchronous task is used for taking N pieces of data from the C1 cache, updating the rest data of the cache C1, calling an algorithm fine-ranking model to remove unmatched data in the N pieces of data, and storing the matched number into the cache C2. As shown in fig. 4, the data is read directly from cache C2 and then returned directly, when the user initiates the retrieval of the next page.

Through the process, a set of efficient and accurate voice retrieval system is constructed from heterogeneous data synchronization, rich semantic label construction, high-screen design based on labeled session data and quick and accurate semantic search of hundred million-level data scale.

The embodiment of the invention has at least the following beneficial effects:

(1) The semantics of the sentence is characterized by using the unsupervised contrast learning, the candidate text can be recalled with high quality under the condition of not adding a large amount of labeling resources, and on the basis, the sentence expression information learned by the unsupervised contrast learning is transmitted to other sentences by using the BiLSTM, so that the characterization of each sentence simultaneously contains the context information.

(2) In the multi-round conversation text retrieval, a BiLSTM mode is introduced, the information of the context text is automatically learned through the multi-round text, the problem of long text is naturally solved, the BiLSTM mechanism can be utilized to focus on the information of the round conversation, and the context information is selectively utilized.

(3) In the multi-round conversational text retrieval, when context text information is solved, sentence expressions trained in an unsupervised mode are fully utilized, and then fine adjustment is carried out through supervised data. And at the moment, two model parameters of the query tower and the document tower are not shared, and the model parameters are of a heterogeneous double-tower structure.

(4) Heterogeneous data is automatically synchronized. The field names and the type mapping between the target database are defined through the self-developed synchronous middleware, and the data of different structure types are automatically synchronized to the elastic search database by monitoring the data update event of the MongoDB database, so that the method is stable and reliable and excellent in performance.

(5) Rich semantic label construction. By creating a label model, the monitored content, the application range and the multi-dimensional-based high-screen condition are set under the model, and semantic labeling can be carried out on each communication or each sentence. Meanwhile, in order to avoid the need of recalculating the existing tag data after the tag monitors the content, the application range or the high-screen condition is changed, the tag result of each communication session or each sentence is stored in the file system after the tag data is calculated each time, the tag result of the tags with unchanged content is obtained directly through the tag file, the algorithm service is continuously invoked to calculate the tag data for the tag with changed content, and then the new tag result is updated back to the file system.

(6) High-screen design based on tagged sessions. Depending on the query conditions, the system defines a conversation-level tagged-based data model and a sentence-level tagged-based data model. The session-level data model adopts a multi-label dimension design and is used for multi-dimensional high-screen query without sentence semantic query; the sentence-level data model also adopts a multi-label dimension design, and the dimension data of a part of conversation-level data model and the dimension with sentence vectors are redundant, which is mainly used for multi-dimensional high-screen query with sentence semantic query. By adopting different data models in different inquiry scenes, not only can efficient inquiry be realized, but also the accuracy is higher.

(7) And the quick and accurate semantic searching capability of billion data quantity is supported. Through constructing rich semantic tags and a high-screen design based on tagged sessions, different data models are adopted in multi-round session text search according to different search conditions. For multi-dimensional high-screen query without semantic query, directly searching results through accurate query of an elastic search database; for multi-dimensional high-screen query with semantic query, different searching modes are adopted by judging whether the query is a sentence or a word. If the search is word search, the accurate matching query is directly carried out through the elastosearch short sentence math_phase. If the sentence is searched, two paths of concurrent searches are adopted, one path of concurrent searches is recalled through an elastic search BM25, one path of recall algorithm calculates semantic vectors and then recalls through ElasticSearch ANN, then the two paths of searched results are subjected to aggregation and sorting, unmatched data are removed through an algorithm precise removal algorithm, and finally a top k result is returned. Meanwhile, in order to accelerate the inquiry of the next page, the data matched through asynchronous precomputation is put in the cache, and the inquiry of the next page can directly read the cache and return the result.

(8) A set of semantic analysis searching which is synchronous from heterogeneous data, rich in semantic label construction, high-screen design based on labeled session data and rapid and accurate is designed, and an efficient and accurate voice retrieval system construction method is provided. The self-grinding heterogeneous data synchronization middleware can efficiently and accurately synchronize updated data from the MongoDB database to an elastic search database used for retrieving data by monitoring update events of the database. Each communication or each sentence can construct a set of rich semantic tags through a custom tag model, so that data can be accurately queried through the rich tag dimensions. Based on an elastic search database with semantic vector search capability, a session-level tagged data model conversion and a Sentence-level tagged data model Sentence are defined, and multi-dimensional high-screen queries without semantic search or with semantic search can be supported. Through the technical schemes of multi-channel concurrent search, multi-channel result merging and sorting, sentence vector calculation and accurate elimination model of an algorithm, pre-search calculation, cache and the like, data query with more than one hundred million levels can be supported, the performance is excellent, the query result is accurate, and the user experience is greatly improved.

It can be seen that, implementing the voice retrieval method based on semantic analysis and high-dimensional modeling described in fig. 1, firstly obtaining text data translated by voice call data, storing the text data in a newly built basic database, then constructing a retrieval database, establishing data synchronization between the basic database and the retrieval database, further constructing a multi-dimensional semantic tag and a text data model in the data synchronization process, generating screening conditions for text data retrieval through the semantic tag and the text data model during retrieval, after determining search text and the screening conditions required during retrieval, screening corresponding text data in the retrieval database through high-screen processing, generating a recall candidate set required by a user through text recall based on statistical data, text recall based on semantic and text recall based on context association, and finally obtaining a final recall candidate set corresponding to the search text and the screening conditions through sorting processing and filtering processing. The method can realize recall based on semantics, can improve the quality of text recall, can realize recall based on context embedding of search text, can solve the problem that the existing search cannot consider the context information, can synthesize the respective characteristics of three recall modes, and can improve the accuracy of text recall as much as possible under the condition of ensuring the performance of text recall. In addition, the high-level screening design based on the multi-dimensional semantic tags and the text data model can realize rapid and accurate voice retrieval of data volume above hundred million levels, meanwhile, multiple paths of concurrent retrieval, multiple paths of result merging and sorting, algorithm precise filtering, pre-search calculation caching and the like are introduced, and the retrieval efficiency is greatly improved while the retrieval accuracy is ensured.

Example two

Referring to fig. 5, fig. 5 is a schematic diagram of a speech retrieval system based on semantic analysis and high-dimensional modeling according to an embodiment of the present invention. As shown in fig. 5, the speech retrieval system based on semantic analysis and high-dimensional modeling may include:

the data synchronization module is used for constructing a search database, establishing data synchronization between a basic database and the search database, constructing multidimensional semantic tags for text data according to service requirements, establishing a text data model defined based on text length, and establishing screening conditions of the text data through the semantic tags and the text data model;

For the specific description of the voice retrieval system based on the semantic analysis and the high-dimensional modeling, reference may be made to the specific description of the voice retrieval method based on the semantic analysis and the high-dimensional modeling, which is not described in detail herein.

Example III

The embodiment of the invention discloses a computer storage medium which stores computer instructions for executing the steps in the voice retrieval method based on semantic analysis and high-dimensional modeling disclosed in the first embodiment of the invention when the computer instructions are called.

The apparatus embodiments described above are merely illustrative, in which the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above detailed description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied in essence or in a portion contributing to the prior art in the form of a software product that may be stored in a computer readable storage medium including Read-only memory (ROM), random access memory (RandomAccessMemory, RAM), programmable Read-only memory (PROM), erasable programmable Read-only memory (ErasableProgrammableReadOnlyMemory, EPROM), one-time programmable Read-only memory (One-OnlyMemory, OTPROM), electrically erasable programmable Read-only memory (CD-ROM) or other optical disk memory, magnetic disk memory, tape memory, or any other medium that can be used for carrying or storing data that is readable by a computer.

Finally, it should be noted that: the embodiment of the invention discloses a voice retrieval method based on semantic analysis and high-dimensional modeling, which is disclosed by the embodiment of the invention only as a preferred embodiment of the invention, and is only used for illustrating the technical scheme of the invention, but not limiting the technical scheme; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme recorded in the various embodiments can be modified or part of technical features in the technical scheme can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A speech retrieval method based on semantic analysis and high-dimensional modeling, the method comprising:

obtaining voice data and translating the voice data into text data;

constructing a basic database, and storing text data in the basic database;

Determining search text and screening conditions;

2. The semantic analysis and high-dimensional modeling based voice retrieval method according to claim 1, wherein recall of the second and third recall candidate sets is performed by a double-tower model, wherein the double-tower model comprises a search text tower and a document tower;

3. The voice retrieval method based on semantic analysis and high-dimensional modeling according to claim 1, wherein the text recall is performed based on the text data after the high-screen processing to obtain a first recall candidate set corresponding to the search text, comprising:

4. The semantic analysis and high-dimensional modeling based speech retrieval method according to claim 1, wherein the ranking process comprises:

5. The speech retrieval method based on semantic analysis and high-dimensional modeling according to claim 1, wherein the establishing of data synchronization of the base database and the retrieval database comprises:

6. The voice retrieval method based on semantic analysis and high-dimensional modeling according to claim 1, wherein the constructing a multi-dimensional semantic tag for text data according to business requirements comprises:

7. The speech retrieval method based on semantic analysis and high-dimensional modeling according to claim 1, wherein the text-based application scenario defines a text data model, comprising:

8. The speech retrieval method based on semantic analysis and high-dimensional modeling according to claim 1, wherein the performing the high-screening process on the text data in the retrieval database based on the screening condition comprises:

9. A speech retrieval system based on conversational semantic analysis and high-dimensional modeling, the system comprising:

the data synchronization module is used for constructing a search database, establishing data synchronization between a basic database and the search database, constructing multidimensional semantic tags for text data according to service requirements, defining a text data model based on an application scene of the text, and creating screening conditions of the text data through the semantic tags and the text data model;

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the semantic analysis and high-dimensional modeling based speech retrieval method according to any one of claims 1-8.