CN108132952B - Active type searching method and device based on voice recognition - Google Patents

Active type searching method and device based on voice recognition Download PDF

Info

Publication number
CN108132952B
CN108132952B CN201611091688.4A CN201611091688A CN108132952B CN 108132952 B CN108132952 B CN 108132952B CN 201611091688 A CN201611091688 A CN 201611091688A CN 108132952 B CN108132952 B CN 108132952B
Authority
CN
China
Prior art keywords
information
search
voice
user
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611091688.4A
Other languages
Chinese (zh)
Other versions
CN108132952A (en
Inventor
项连志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Priority to CN201611091688.4A priority Critical patent/CN108132952B/en
Priority to PCT/CN2017/076968 priority patent/WO2018098932A1/en
Publication of CN108132952A publication Critical patent/CN108132952A/en
Application granted granted Critical
Publication of CN108132952B publication Critical patent/CN108132952B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices

Abstract

The invention provides an active searching method and device based on voice recognition, which are used for recognizing received voice information and obtaining a voice characteristic vector of the voice information; judging whether the voice feature vector is matched with a reference voice feature vector preset by a user or not, and obtaining a corresponding judgment result; if the judgment result is not matched, generating a theme domain corresponding to the voice information based on the recognition of the voice information; expanding the topic domain map into a corresponding search structure; and performing active matching search according to the search structure, acquiring corresponding subject information and presenting the subject information to the user. Compared with the prior art, the method and the device have the advantages that whether the voice feature vector of the received voice information is matched with the reference voice feature vector is used as a condition for judging whether the search function is started, the function of feeding back related information or an abstract to a user in real time is realized, the user can timely and quickly acquire unknown information, and the problem of information asymmetry in communication or other language environments is solved.

Description

Active type searching method and device based on voice recognition
Technical Field
The invention relates to the technical field of computers, in particular to an active type searching technology based on voice recognition.
Background
In human life, communication is the most basic social activity, but a scene of poor communication often appears, for example, in a social occasion, one party chats about music, and the other party is slightly embarrassed due to poor understanding of the field; during business negotiation, a business party provides a new concept and method, and communication is blocked; in expert consultation occasions, experts teach numerous ideas and opinions, listeners cannot find out to be up to date, and further communication cannot be developed. In summary, it can be seen that communication efficiency is affected by communication blockage, and opportunities for business and human pulse establishment may be lost. The fundamental reason for this is that the information is not equal in breadth and depth (i.e., information is not symmetrical) and the information acquisition is delayed.
The existing information acquisition mainly depends on a search engine, and related information is acquired through the search engine, so that the difficulty that people acquire information from massive information is solved to a great extent, but the following problems exist when the method is applied to communication: 1) the relevance of the information obtained by searching depends on the keywords input by the user. Most of the current search engines search information in a way of taking keywords as indexes, and when a user cannot construct proper keywords, the relevance of the information is sharply reduced; 2) the information acquisition hysteresis is that after a user constructs a keyword, the keyword is input into a search engine to obtain related information, and the series of processes have hysteresis relative to communication.
Therefore, how to provide an active search technology based on speech recognition to help users to obtain unknown information in time becomes one of the technical problems that the skilled person needs to solve urgently.
Disclosure of Invention
The invention aims to provide an active type searching method and device based on voice recognition.
According to an aspect of the present invention, there is provided an active type search method based on speech recognition, wherein the method comprises the steps of:
a. recognizing the received voice information and obtaining a voice characteristic vector of the voice information;
b. judging whether the voice feature vector is matched with a reference voice feature vector preset by a user or not, and obtaining a corresponding judgment result;
c. if the judgment result is not matched, generating a theme domain corresponding to the voice information based on the recognition of the voice information
d. Expanding the topic domain map into a corresponding search structure;
e. and performing active matching search according to the search structure, acquiring corresponding subject information and presenting the subject information to the user.
Preferably, the step c includes:
if the judgment result is not matched, the voice information is translated into a natural language text based on the recognition of the voice information;
and performing natural semantic analysis on the natural language text to generate a theme domain corresponding to the voice information.
Preferably, the subject domain includes at least any one of:
a field of information;
an information intent;
an information intent object.
More preferably, the subject domain comprises an information field, an information intention and an information intention object, wherein the step d comprises:
determining a corresponding theme template according to the information field and the information intention of the theme domain;
generating an extension keyword list by combining the information intention object according to the theme template, and filling the theme domain and the extension keyword list into the theme template;
and mapping and expanding the filled theme template into a corresponding search structure.
Preferably, the step d includes:
and expanding the subject domain mapping into the search structure according to the attribute of the search engine corresponding to the active matching search.
Preferably, the step e comprises:
and performing active matching search according to the search structure and by combining cognitive calculation, and acquiring and presenting corresponding subject information to the user.
Preferably, the step e comprises:
performing active matching search according to the search structure to obtain corresponding candidate information;
generating the subject information based on information extraction and integration of the candidate information;
and presenting the theme information to the user.
According to another aspect of the present invention, there is also provided an active type search apparatus based on voice recognition, wherein the search apparatus includes:
the recognition device is used for recognizing the received voice information and obtaining a voice feature vector of the voice information;
the judging device is used for judging whether the voice feature vector is matched with a reference voice feature vector preset by a user or not and obtaining a corresponding judgment result;
generating means, configured to generate a theme zone corresponding to the voice information based on the recognition of the voice information if the determination result is not matched;
mapping means for expanding the topic domain map into a corresponding search structure;
and the presentation device is used for actively matching and searching according to the search structure, acquiring corresponding theme information and presenting the theme information to the user.
Preferably, the generating means is configured to:
if the judgment result is not matched, the voice information is translated into a natural language text based on the recognition of the voice information;
and performing natural semantic analysis on the natural language text to generate a theme domain corresponding to the voice information.
Preferably, the subject domain includes at least any one of:
a field of information;
an information intent;
an information intent object.
Preferably, the theme zone comprises an information domain, an information intention and an information intention object, wherein the mapping means is configured to:
determining a corresponding theme template according to the information field and the information intention of the theme domain;
generating an extension keyword list by combining the information intention object according to the theme template, and filling the theme domain and the extension keyword list into the theme template;
and mapping and expanding the filled theme template into a corresponding search structure.
Preferably, the mapping device is configured to expand the topic domain map into the search structure according to an attribute of a search engine corresponding to the active matching search.
Preferably, the presentation device is configured to:
and performing active matching search according to the search structure and by combining cognitive calculation, and acquiring and presenting corresponding subject information to the user.
Preferably, the presentation device comprises:
the acquisition unit is used for carrying out active matching search according to the search structure to acquire corresponding candidate information;
the integration unit is used for extracting integration based on the information of the candidate information and generating the subject information;
and the presentation unit is used for presenting the theme information to the user.
Compared with the prior art, the invention has the following advantages:
the voice information is continuously and actively acquired, whether the voice feature vector of the voice information is matched with the preset reference voice feature vector is judged, namely whether a sender of the voice information is a preset user is judged, the judged result is used as a condition for judging whether a search function is started, then the theme information of the information corresponding to the voice information is acquired through search operation and is presented to the user, the user is helped to timely and quickly acquire unknown information, and the problem of information asymmetry in communication or other language environments is solved.
Furthermore, the method and the device can more accurately obtain the keywords in the voice information by adopting a natural semantic analysis technology, and expand or delete the keywords according to the semantics of the voice information, thereby improving the accuracy and hit rate of the search result and improving the use experience of the user.
Furthermore, the method and the device generate the subject domain corresponding to the voice information based on the recognition of the voice information, and the information field, the information intention and the information intention object in the subject domain structure of the subject domain supplement and correct the keywords in the subject domain, so that the searching accuracy is improved, the noise in the searching process is reduced, and the use experience of a user is improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 illustrates a schematic diagram of an active search engine based on speech recognition, according to an aspect of the present invention;
FIG. 2 illustrates a search structure diagram when using a hundred degree search engine according to one embodiment of the invention;
FIG. 3 illustrates a flow diagram of an active search method based on speech recognition in accordance with another aspect of the subject innovation.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
The term "computer device" or "computer" in this context refers to an intelligent electronic device that can execute predetermined processes such as numerical calculation and/or logic calculation by running predetermined programs or instructions, and may include a processor and a memory, wherein the processor executes a pre-stored instruction stored in the memory to execute the predetermined processes, or the predetermined processes are executed by hardware such as ASIC, FPGA, DSP, or a combination thereof. Computer devices include, but are not limited to, servers, personal computers, laptops, tablets, smart phones, and the like.
It should be noted that the user equipment, the network device, the network, etc. are only examples, and other existing or future computer devices or networks may also be included in the scope of the present invention, and are included by reference.
The methods discussed below, some of which are illustrated by flow diagrams, may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine or computer readable medium such as a storage medium. The processor(s) may perform the necessary tasks.
Specific structural and functional details disclosed herein are merely representative and are provided for purposes of describing example embodiments of the present invention. The present invention may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element may be termed a second element, and, similarly, a second element may be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected" or "directly coupled" to another element, there are no intervening elements present. Other words used to describe the relationship between elements (e.g., "between" versus "directly between", "adjacent" versus "directly adjacent to", etc.) should be interpreted in a similar manner.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
The present invention is described in further detail below with reference to the attached drawing figures.
Fig. 1 illustrates a schematic structure of an active search apparatus based on speech recognition according to an aspect of the present invention. The search device 1 includes: identification means 101, determination means 102, generation means 103, mapping means 104 and presentation means 105.
The following description is given in detail by taking an example in which the search apparatus 1 is located in a network device, the network device interacts with a user device, receives voice information collected by the user device, recognizes the received voice information, obtains a voice feature vector of the voice information, determines whether the voice feature vector matches a reference voice feature vector preset by a user, generates a topic domain corresponding to the voice information based on the recognition of the voice information if the determination result is mismatching, and expands the topic domain mapping into a corresponding search structure; and performing active matching search according to the search structure, acquiring corresponding theme information and returning the theme information to the user equipment so as to present the theme information to the user.
Here, the user equipment includes, but is not limited to, a personal computer, a laptop computer, a tablet computer, a smart phone, a PDA, Virtual Reality (VR) glasses, a Virtual Reality helmet, a smart headset, and the like. The user equipment collects voice information through a voice collecting module on the user equipment, sends the collected voice information to the searching device 1 through the network, so that the searching device 1 can perform voice recognition after receiving the collected voice information, takes a voice recognition result as a condition for whether to start searching, obtains corresponding subject information through searching and presents the corresponding subject information to the user.
The recognition device 101 recognizes the received voice information and obtains a voice feature vector of the voice information. Specifically, the user equipment continuously collects voice information through a specific module thereon, for example, a voice collecting module, and then transmits to the search apparatus 1 through the network, or the searching apparatus 1 periodically obtains the voice information collected by the user equipment directly from the user equipment through an agreed communication mode, such as http, https and other communication protocols, after the searching apparatus 1 obtains the voice information, first of all, noise information in the speech information is filtered out by filtering means, and the received filtered speech information is preprocessed by the recognition means 101, for example, by sampling, quantizing, pre-emphasizing, windowing, the speech feature vectors are then extracted, for example, by using a Mel Frequency Cepstral Coefficient (MFCC) extraction method, a Linear Predictive Coding (LPC) extraction method, a high Frequency domain emphasis extraction method, or a window function extraction method. The voice feature vector is a voice feature parameter for speaker recognition (voiceprint recognition), and whether voice information corresponding to the two voice feature vectors is from the same user can be generally determined by judging whether the two voice feature vectors are matched. The Mel frequency is extracted based on the auditory characteristics of human ears, and the Mel frequency and the Hz frequency form a nonlinear corresponding relation.
For example, in a social occasion, if a user a chats about music and a user B does not know about the topic, B may quickly acquire music related information corresponding to voice information sent by a through the search apparatus 1, specifically, before communication, B activates a search function by clicking a corresponding function area on an application interface corresponding to the search apparatus 1 on a user equipment or in other ways, when a sends voice information, the user equipment collects voice information through a voice receiving module thereon and then sends it to the search apparatus 1 through a network, after the search apparatus 1 acquires the voice information, the recognition apparatus 101 pre-processes the received voice information, for example, Fast Fourier Transform (FFT) is performed on each frame to obtain a spectrum, and then obtains a magnitude spectrum, and adds a Mel filter set to the magnitude spectrum, the output of all filters is subjected to logarithmic operation (Logarithm), and then further subjected to Discrete Cosine Transform (DCT) to obtain a voice feature parameter MMFC, thereby obtaining a corresponding voice feature vector.
It should be understood by those skilled in the art that the above-mentioned method for obtaining the speech feature vector is only an example, and that existing or future methods for obtaining the speech feature vector, which may be applied to the present invention, are included in the scope of the present invention and are herein incorporated by reference.
The determining device 102 determines whether the speech feature vector matches a reference speech feature vector preset by a user, and obtains a corresponding determination result. Specifically, the user sets a reference voice feature vector in the search apparatus 1 in advance, for example, by receiving the voice information of a predetermined user in advance and recognizing the voice information by the recognition apparatus 101 to obtain a voice feature vector of the voice information, and then stores the voice feature vector as the reference voice feature vector, for determining whether the voice feature vector of the voice information recognized by the search apparatus 1 matches the reference voice feature vector, that is, whether the originator of the voice information recognized by the search apparatus 1 is the predetermined user, obtains a corresponding determination result, and further takes the determination result as a condition for whether to start the search. Wherein the predetermined user may be determined according to a specific use case.
For example, the user C sends out voice information to the search apparatus 1, the recognition apparatus 101 recognizes the voice information and obtains a voice feature vector of the voice information, the voice feature vector is used as a reference voice feature vector, whether the voice information subsequently received by the search apparatus 1 comes from the user C can be judged through the reference voice feature vector, and then a corresponding response is made according to the judgment result. For example, 1) an expert consults an occasion, where the expert explains a lot of ideas and opinions, the searching apparatus 1 receives the voice information in the occasion, recognizes the received voice information, obtains a voice feature vector x of the voice information, and the judging apparatus 102 compares the voice feature vector x with the reference voice feature vector, judges whether the two are matched, obtains a corresponding judgment result, and if the judgment result is matched, the voice information is the voice information sent by the user C, and is not further processed; if the judgment result is not matched, the voice information is the voice information sent by the non-user C, the searching device 1 carries out subsequent searching operation, obtains a corresponding searching result and provides the searching result for the user C, and the user C can understand the idea and the opinion taught by the expert according to the searching result, effectively communicate and consult with the expert and improve the efficiency and the quality of the communication between the user C and the expert; 2) when a user C uses the searching device 1 in a relatively noisy environment to perform voice searching based on voice information sent by the user C, the searching device 1 receives the voice information in the occasion, identifies the received voice information, obtains a voice feature vector y of the voice information, compares the voice feature vector y with the reference voice feature vector by the judging device 102, judges whether the two are matched, obtains a corresponding judging result, if the judging result is matched, the voice information is the voice information sent by the user C, and performs subsequent searching operation by the searching device 1, obtains a corresponding searching result and provides the corresponding searching result for the user C; if the judgment result is not matched, no further processing is carried out.
If the determination result is not matching, the generating device 103 generates a theme zone corresponding to the voice information based on the recognition of the voice information. Specifically, if a user wants to search for information corresponding to speech information uttered by a user other than a predetermined user (including the user), a speech feature vector of the speech information uttered by the predetermined user is used as a reference speech feature vector, if the determination result is that the speech feature vector does not match, the utterer of the received speech information is not the predetermined user, that is, the utterer of the received speech information is a user other than the predetermined user, the speech feature vector meets the condition for the user to search for, the generating device 103 identifies a specific acoustic model by a feature extraction technique and a pattern matching technique based on the speech information identified by the identifying device 101, forms a certain language model by the acoustic model training, performs fast optimization in a space formed by the acoustic model and the language model, and converts the speech information into text information, performing semantic analysis on the text information, for example, performing word segmentation on the text information by using a full-segmentation word segmentation technology to obtain corresponding keywords, then performing characteristic value calculation according to a preset characteristic model, matching an optimal subject domain, and filling the keywords into the subject domain; wherein the subject domains include, but are not limited to: information domain (domain); information intent (intent); an information intention object (object). Among these, the subject domain (which can be considered as a feature structure) can be obtained by: firstly, characteristic parameters are set by manual marking; secondly, capturing a large amount of basic data, and performing machine learning to generate characteristic parameters; the second approach sometimes also incorporates a way of manual supervision correction to achieve better results. Among them, the same type of data or resource and the services provided around the data or resource are called an information domain (domain). The information field data is generally structured table data and has a main key (main attribute), and the information field is generally named by nouns; the information intention (intent) represents the operation of a user on information field data, such as query, query of a value of a certain attribute, reservation, dialing and the like, and is generally named by verbs; the information intention object (object) describes the intention, and is a parameter required for realizing the intention, and is embodied by a main keyword related to the text information, which is obtained by a word segmentation technology. The above-mentioned information field, information intention and information intention object can also be regarded as a subject field structure of the subject field.
For example, the user D who uses the search device 1 discusses a matter of going back to the old with the friend E, using the speech feature vector of the uttered speech information as the reference speech feature vector, and says: i have already ordered the train ticket back to Qingdao, friend E says: i need to return to Qingdao recently, and do not know that the train ticket from Beijing to Qingdao can not be ordered at present. The recognition device 101 respectively recognizes the received voice information, obtains voice feature vectors corresponding to the voice information respectively, the judgment device 102 respectively judges the voice information recognized by the recognition device 101, judges that the voice feature vector of the voice information sent by the user D is matched with the reference voice feature vector, if not further processing is carried out, the judgment device 102 judges that the voice information sent by the user E, i.e. the voice feature vector is not matched with the reference voice feature vector, namely the sender of the voice information is not a user, the generation device 103 converts the voice information into character information, carries out word segmentation on the character information through semantic analysis, carries out feature value calculation on the text after word segmentation according to a preset feature model, and matches an optimal theme domain, and filling the text content into the subject domain to obtain a simple subject domain structure:
the field is as follows: a train;
intention is: presetting;
the intention object is:
{
an originating station: beijing;
arrival station: a Qingdao;
}
it will be understood by those skilled in the art that the subject areas described above are by way of example only and that subject areas that are or may become available in the future, as applicable to the present invention, are intended to be encompassed within the scope of the present invention and are hereby incorporated by reference.
The mapping means 104 expands the topic domain map into a corresponding search structure.
Specifically, the mapping device 104 maps the corresponding keywords in the filled topic domain according to a preset rule, for example, maps the keywords through a certain template, and performs corresponding expansion, for example, expands a part of keywords according to the information of the template, supplements the keyword information in the topic domain, and obtains a corresponding complete search structure.
The presentation device 105 performs active matching search according to the search structure, acquires corresponding topic information, and presents the topic information to the user. Specifically, the presentation device 105 performs active matching search according to the search result determined by the mapping device 104 to obtain a large number of related search results, the presentation device 105 determines the search results, and if the search results are thematic information, the search results are directly presented to the user; if the search result is non-thematic information, the presentation device 105 performs thematic information extraction, presents the refined and gathered main information to the user through a screen of the smart machine or a wearable earphone which receives the voice message, helps the user to quickly acquire specific content related to the voice message, and timely and accurately acquires unknown information.
Here, the method for the presentation device 105 to obtain the corresponding topic information includes, for example, 1) for a single document, adopting an automatic summarization technology, for example, an understanding-based automatic summarization technology or a structure-based automatic summarization technology to summarize and extract the obtained search result, so as to obtain concise and coherent topic information; 2) for a plurality of documents, the search results with similar contents can be grouped and classified by a clustering method, and then topic information is extracted by adopting an abstract technology, wherein the clustering method needs to calculate the characteristic value of the document corresponding to each search result and then aggregate the characteristic values into a certain category. The feature parameters and the seed document feature values required for calculating the feature values of each document are trained through a training model in advance.
The searching device 1 provides a reference voice feature vector presetting function for a user, the user presets a reference voice feature vector through the function, then judges whether the voice feature vector of the received voice information is matched with the reference voice feature vector, takes the judgment result as a condition for judging whether the searching function is started, carries out thematic organization on the searched information, obtains thematic information and presents the thematic information to the user, and realizes continuous and active receiving of the voice information in communication or in a complex voice environment, autonomously judges the voice information, understands and searches the information related to the voice information, and feeds back related information or an abstract to the user in real time, so that the user can timely and quickly obtain unknown information, and the problem of information asymmetry in communication is solved. For example, 1) when receiving a consultation service, setting a reference voice feature vector as a reference voice feature vector of the voice information of a user, wherein the searching device 1 can continuously and actively receive the voice information, if a sender of the received voice information is the user, the voice information is not further processed, if the sender is other people, a theme domain corresponding to the voice information is generated, searching is performed according to the theme domain, information such as background theme information, specific knowledge points or technical frameworks and the like corresponding to the received voice information is acquired for the user, and an accurate and comprehensive professional answer is timely provided for the user; 2) assuming that a user is receiving medical services, the searching device 1 provides relevant explanations and principles of the content of doctors to the user in real time, so that the user can continuously ask questions on the basis to obtain desired information, and the information asymmetry between doctors and patients is relieved to a certain extent; 3) during negotiation business, for the questions or concepts provided by clients, the users can answer the questions more professionally and comprehensively through the searching device 1, and the development of next-step cooperation is facilitated; 4) in daily life, the user can talk with other people by using the information provided by the searching device 1 in real time, which can relate to multiple fields and multiple subjects, and the effectiveness and the expansibility of the communication between the two parties are improved. Further, the search device 1 will promote the information popularity to a new level, and will provide a new acquisition mode for those who do not know how to acquire information.
Preferably, the generating means 103 is configured to: 1) if the judgment result is not matched, the voice information is translated into a natural language text based on the recognition of the voice information; 2) and performing natural semantic analysis on the natural language text to generate a theme domain corresponding to the voice information. Specifically, if the voice feature vector of the received voice information does not match with a reference voice feature vector preset by a user, that is, the sender of the received voice information is not a predetermined user, the generating device 103 performs analog-to-digital conversion on the voice information based on the recognition of the voice information to obtain an audio interval of the voice information, extracts the feature quantity of the audio data of the audio interval, and recognizes the voice information as a corresponding natural language text based on the feature vector; then, the generating device 103 performs natural semantic parsing on the natural language text to obtain a corresponding parsing result, and generates a topic domain corresponding to the voice information according to the parsing result.
Here, the generating device 103 obtains the keywords in the voice information more accurately by using a natural semantic analysis technology, and expands or deletes the keywords according to the semantics of the voice information, so as to improve the accuracy and hit rate of the search result and improve the user experience.
Preferably, the topic domain comprises an information domain, an information intention and an information intention object, wherein the mapping device 104 determines a corresponding topic template according to the information domain and the information intention of the topic domain; generating an extension keyword list by combining the information intention object according to the theme template, and filling the theme domain and the extension keyword list into the theme template; and mapping and expanding the filled theme template into a corresponding search structure.
Specifically, when the subject domain structure of the subject domain includes: when the information field, the information intention and the information intention object are used, after the text content converted from the voice information is filled into the subject field, the information intention and the information intention object of the subject field respectively have corresponding keywords. The mapping device 104 obtains a corresponding topic template from the topic template library through pattern matching according to the information field and the information intention in the topic domain, and an information intention object (a main keyword) combines with a preset rule in the topic template to generate an extension keyword list, and fills the content of the topic domain and the extension keyword list to corresponding positions of the topic template respectively, wherein the extension keyword can be an added keyword or a subtracted keyword, and fills the extension keyword to corresponding positions of the topic template, so that each content of the topic template is more complete, and the hit rate of search is improved. The information intention object (main keyword) and the extension keywords jointly form a topic template and all search keywords of the corresponding search structure. The topic template is preset with a search mode, the search mode is, for example, a search instruction, a specific search structure preset by each search instruction (where the specific search structure is a search structure of the topic template and is different from a search structure corresponding to a search engine), and the like, and the content of the topic template is mapped and expanded into some attributes of the search structure corresponding to the preset search engine. The topic template library is a database which is preset and used for storing topic templates, for example, a large number of voice information samples are collected and analyzed to obtain a large number of topic templates, and the topic templates are stored in the topic template library.
The expanded keywords enrich the search keywords, so that the results desired by the user can be obtained more easily, and meanwhile, noise information in the search keywords can be reduced, unnecessary searches are reduced, and the keywords are more accurate. The method comprises the steps of obtaining a corresponding theme template according to a theme domain, filling the theme domain content and the extension expansion keywords generated corresponding to the theme template into the theme template, and directly mapping the theme template into a search structure corresponding to a search engine, so that the accuracy of the search keywords is improved, the pressure of the search engine is reduced, the hit rate of the search is improved, more accurate matching results are presented for a user, and the capability of the user for quickly obtaining information in real time is improved.
For example, in the above example, the speech information recognized by the recognition apparatus 101 is: "i should return to Qingdao recently without knowing that a train ticket from Beijing to Qingdao cannot be ordered at present", the mapping device 104 obtains a corresponding theme template in the theme template library by pattern matching according to the information field (train) and the information intention (reservation) in the theme domain generated by the generating device 103, correspondingly fills the content of the theme domain into the theme template, and sets the information intention object (main keywords: Beijing and Qingdao) in combination with the rules preset in the theme template, for example, the date limit of presetting various tickets in the theme template is less than one week, and sets the transportation modes of excluding planes, buses, ships and the like in the theme template according to the information that the information field is the train, thereby generating an extended keyword list: date: within one week; the traffic mode is as follows: -airplane, -bus, -ship, wherein "within one week" is an increased keyword, and "airplane, bus, ship" is a reduced keyword, and correspondingly filling the subject domain content and the extensional expansion keyword into the obtained subject template:
topic template name: train & query
And (3) search instructions:
the starting place: { Beijing };
destination: { Qingdao };
date: { optional, if not set, optional last week }
Priority classes: { optional }
Excluded traffic modes: { airplane, bus, boat }
If the search apparatus 1 performs a search using an hundred-degree search engine, the content of the topic template is mapped to a search structure corresponding to the hundred-degree search engine, for example, the content of the topic template is mapped to the search structure shown in fig. 2, and an active matching search is performed to obtain corresponding subject information and present the subject information to the user. FIG. 2 illustrates a framework for a search structure when using a hundred degree search engine, according to one embodiment of the invention.
Preferably, the mapping device 104 expands the topic domain mapping into the search structure according to the attribute of the search engine corresponding to the active matching search. Specifically, the mapping device 104 constructs a search instruction according to the information field, the information intent object, or any combination of the three of the subject field, and then generates a corresponding search structure based on the difference of the search engine attributes corresponding to the active matching search. For example, when a user uses a hundred-degree search engine to search, a search structure frame corresponding to a subject field generated according to the recognition of the received user voice information is as shown in fig. 2, and according to the received voice information, a keyword in a search structure corresponding to the voice information is automatically embedded in the input bar of fig. 2 to perform a search operation.
Preferably, the presentation device 105 performs active matching search according to the search structure corresponding to the topic domain and by combining cognitive computation, obtains corresponding topic information, and presents the topic information to the user. Specifically, the presentation device 104 matches the information meeting the search structure according to the search structure corresponding to the topic domain in combination with cognitive computation, obtains the topic information of the information, and presents the topic information to the user, for example, learns and analyzes the keyword information in the search structure, senses the user's requirement according to the keyword information, obtains information with high relevance for the user, performs topic organization on the search result, obtains corresponding topic information, and presents the corresponding topic information to the user, thereby improving the intelligence of the search device 1 for performing automatic matching search. The cognitive calculation is used for calculating unstructured information by accumulating simple unit calculation levels, so that the simulation of human brain thinking mode and cognition is realized.
Preferably, the presentation device 105 comprises: an acquisition unit 1051 (not shown), an integration unit 1052 (not shown) and a presentation unit 1053 (not shown).
The obtaining unit 1051 performs active matching search according to the search structure corresponding to the subject domain, and obtains corresponding candidate information. Specifically, the obtaining unit 1051 performs automatic matching search according to the search structure to obtain a large number of related search results, the display interface of the presenting device 105 is limited, and the time for the user to obtain information is also limited, so that the large number of related search results are required to be used as candidate information, and the candidate information needs to be presented to the user after being refined.
The integration unit 1052 extracts integration based on the information of the candidate information, generating the subject information. Specifically, a resource and rule base are preset, the integration unit 1052 determines a topic according to the search structure, then extracts a description vector related to the topic from the information of the search result through the resources and rules stored in the resource and rule base, and generates global information related to the topic as topic information based on the description vector, so that a user can quickly acquire relevant knowledge through the global information. For example, in the case where the user inquires about "urticaria", there are a large number of search results after the search is performed by the search apparatus 1, all the results cannot be presented to the user at one time due to the limitation of the presentation apparatus 105, and in order for the user to quickly acquire the main knowledge, the integration unit 1052 summarizes the urticaria as the following subject information: a wind cluster; a wheal mass; skin diseases; sudden appearance of a red mass on the local or systemic skin; the disease onset is fast, and the disappearance is fast; acute itching; allergy. The user can generally know the urticaria disease after seeing the global information.
The presentation unit 1053 presents the subject information to the user. Specifically, the presentation manner for presenting the subject information to the user includes, but is not limited to: rendered by an audio device; presenting through a video device; presented through a wearable device. The audio equipment is, for example, an intelligent headset, an intelligent earphone and the like; the video device is, for example, Virtual Reality (VR) glasses, smart glasses capable of freely switching between a general glasses function and a projection function, and the like; wearable equipment includes intelligent wrist-watch, intelligent helmet, intelligent dress etc..
FIG. 3 illustrates a flow diagram of an active search method based on speech recognition in accordance with another aspect of the subject innovation.
In step S301, the search apparatus 1 identifies the received speech information and obtains a speech feature vector of the speech information. Specifically, the user equipment continuously collects voice information through a specific module thereon, for example, a voice collecting module, and then transmits to the search apparatus 1 through the network, or the searching apparatus 1 periodically obtains the voice information collected by the user equipment directly from the user equipment through an agreed communication mode, such as http, https and other communication protocols, after the searching apparatus 1 obtains the voice information, firstly filters noise information in the voice information through a filtering apparatus, in step S301, the search apparatus 1 performs preprocessing on the received filtered speech information, such as by sampling, quantizing, pre-emphasizing, and windowing, the speech feature vectors are then extracted, for example, by using a Mel Frequency Cepstral Coefficient (MFCC) extraction method, a Linear Predictive Coding (LPC) extraction method, a high Frequency domain emphasis extraction method, or a window function extraction method. The voice feature vector is a voice feature parameter for speaker recognition (voiceprint recognition), and whether voice information corresponding to the two voice feature vectors is from the same user can be generally determined by judging whether the two voice feature vectors are matched. The Mel frequency is extracted based on the auditory characteristics of human ears, and the Mel frequency and the Hz frequency form a nonlinear corresponding relation.
For example, in a social occasion, if a user a chats about music and a user B does not know about the topic, B may quickly acquire music related information corresponding to the voice information sent by a through the search apparatus 1, specifically, before communication, B activates a search function by clicking a corresponding function area on an application interface corresponding to the search apparatus 1 on a user equipment or in other ways, when a sends the voice information, the user equipment collects the voice information through a voice receiving module thereon, and then sends the voice information to the search apparatus 1 through a network, after the search apparatus 1 acquires the voice information, in step S301, the search apparatus 1 pre-processes the received voice information, for example, Fast Fourier Transform (FFT) is performed on each frame, a spectrum is obtained, a magnitude spectrum plus Mel filter set is added, the output of all filters is subjected to logarithmic operation (Logarithm), and then further subjected to Discrete Cosine Transform (DCT) to obtain a voice feature parameter MMFC, thereby obtaining a corresponding voice feature vector.
It should be understood by those skilled in the art that the above-mentioned method for obtaining the speech feature vector is only an example, and that existing or future methods for obtaining the speech feature vector, which may be applied to the present invention, are included in the scope of the present invention and are herein incorporated by reference.
In step S302, the search apparatus 1 determines whether the speech feature vector matches a reference speech feature vector preset by a user, and obtains a corresponding determination result. Specifically, the user sets a reference voice feature vector in the search apparatus 1 in advance, for example, by receiving the voice information of a predetermined user in advance, and recognizing the voice information in step S301 by the search apparatus 1 to obtain a voice feature vector of the voice information, and then stores the voice feature vector as the reference voice feature vector, for determining whether the voice feature vector of the voice information recognized by the search apparatus 1 matches the reference voice feature vector, that is, whether the originator of the voice information recognized by the search apparatus 1 is the predetermined user, and obtains a corresponding determination result, and further takes the determination result as a condition for whether to start the search. Wherein the predetermined user may be determined according to a specific use case.
For example, the user C sends voice information to the search apparatus 1, in step S301, the search apparatus 1 recognizes the voice information, obtains a voice feature vector of the voice information, uses the voice feature vector as a reference voice feature vector, and determines whether the voice information subsequently received by the search apparatus 1 comes from the user C according to the reference voice feature vector, and then responds accordingly according to the determination result. For example, 1) an expert consults an occasion, where the expert explains a lot of ideas and opinions, the searching apparatus 1 receives the voice information in the occasion, recognizes the received voice information, and obtains a voice feature vector x of the voice information, in step S302, the searching apparatus 1 compares the voice feature vector x with the reference voice feature vector, determines whether the two are matched, obtains a corresponding determination result, and if the determination result is matching, the voice information is the voice information sent by the user C, and is not further processed; if the judgment result is not matched, the voice information is the voice information sent by the non-user C, the searching device 1 carries out subsequent searching operation, obtains a corresponding searching result and provides the searching result for the user C, and the user C can understand the idea and the opinion taught by the expert according to the searching result, effectively communicate and consult with the expert and improve the efficiency and the quality of the communication between the user C and the expert; 2) when a user C uses the searching device 1 in a relatively noisy environment to perform voice searching based on voice information sent by the user C, the searching device 1 receives the voice information in the occasion, identifies the received voice information, and obtains a voice feature vector y of the voice information, in step S302, the searching device 1 compares the voice feature vector y with the reference voice feature vector, judges whether the two are matched, obtains a corresponding judgment result, if the judgment result is matched, the voice information is the voice information sent by the user C, the searching device 1 performs subsequent searching operation, and obtains a corresponding searching result to provide for the user C; if the judgment result is not matched, no further processing is carried out.
If the determination result is not matching, in step S303, the search apparatus 1 generates a topic area corresponding to the voice information based on the recognition of the voice information. Specifically, if a user wants to search for information corresponding to speech information uttered by a user other than a predetermined user (including the user), a speech feature vector of the speech information uttered by the predetermined user is used as a reference speech feature vector, and if the determination result is that the speech feature vector does not match, the utterer of the received speech information is not the predetermined user, that is, the utterer of the received speech information is a user other than the predetermined user, and the speech feature vector meets a condition for the user to search, in step S301, the search apparatus 1 identifies a specific acoustic model by a feature extraction technique and a pattern matching technique based on the identified speech information, forms a certain language model by the acoustic model training, performs fast search in a space formed by the acoustic model and the language model, and converts the speech information into text information, performing semantic analysis on the text information, for example, performing word segmentation on the text information by using a full-segmentation word segmentation technology to obtain corresponding keywords, then performing characteristic value calculation according to a preset characteristic model, matching an optimal subject domain, and filling the keywords into the subject domain; wherein the subject domains include, but are not limited to: information domain (domain); information intent (intent); an information intention object (object). Among these, the subject domain (which can be considered as a feature structure) can be obtained by: firstly, characteristic parameters are set by manual marking; secondly, capturing a large amount of basic data, and performing machine learning to generate characteristic parameters; the second approach sometimes also incorporates a way of manual supervision correction to achieve better results. Among them, the same type of data or resource and the services provided around the data or resource are called an information domain (domain). The information field data is generally structured table data and has a main key (main attribute), and the information field is generally named by nouns; the information intention (intent) represents the operation of a user on information field data, such as query, query of a value of a certain attribute, reservation, dialing and the like, and is generally named by verbs; the information intention object (object) describes the intention, and is a parameter required for realizing the intention, and is embodied by a main keyword related to the text information, which is obtained by a word segmentation technology. The above-mentioned information field, information intention and information intention object can also be regarded as a subject field structure of the subject field.
For example, the user D who uses the search device 1 discusses a matter of going back to the old with the friend E, using the speech feature vector of the uttered speech information as the reference speech feature vector, and says: i have already ordered the train ticket back to Qingdao, friend E says: i need to return to Qingdao recently, and do not know that the train ticket from Beijing to Qingdao can not be ordered at present. In step S301, the search apparatus 1 respectively recognizes the received voice information to obtain voice feature vectors corresponding to the voice information, in step S302, the search apparatus 1 respectively judges the recognized voice information, and judges that the voice feature vector of the voice information sent by the user D matches with the reference voice feature vector, and then does not perform further processing, in step S302, the search apparatus 1 judges that the voice information sent by the user E, "i.e., i need to return to Qingdao at the latest, does not know that the voice feature vector of the train ticket which can not be ordered to Beijing to Qingdao at present" does not match with the reference voice feature vector, that is, when the sender of the voice information is not a user, in step S303, the search apparatus 1 converts the voice information into text information, performs word segmentation on the text information through semantic analysis, and performs feature value calculation according to a preset feature model, matching the optimal subject field, and filling the text content into the subject field to obtain a simple subject field structure:
the field is as follows: a train;
intention is: presetting;
the intention object is:
{
an originating station: beijing;
arrival station: a Qingdao;
}
it will be understood by those skilled in the art that the subject areas described above are by way of example only and that subject areas that are or may become available in the future, as applicable to the present invention, are intended to be encompassed within the scope of the present invention and are hereby incorporated by reference.
In step S304, the search apparatus 1 expands the topic gamut mapping into a corresponding search structure.
Specifically, in step S304, the search apparatus 1 maps the corresponding keywords in the filled topic domain according to a preset rule, for example, maps the keywords through a certain template, and performs corresponding expansion, for example, expands and generates a part of keywords according to the information of the template, supplements the keyword information in the topic domain, and obtains a corresponding complete search structure.
In step S305, the search apparatus 1 performs active matching search according to the search structure, acquires corresponding topic information, and presents the topic information to the user. Specifically, in step S305, the search apparatus 1 performs active matching search according to the search result determined in step S304 to obtain a large number of related search results, in step S305, the search apparatus 1 determines the search result, and if the search result is the topical information, the search result is directly presented to the user; if the search result is non-topical information, in step S305, the search apparatus 1 extracts topical information, presents the refined and aggregated main information to the user through the screen of the smart machine or the wearable headset that receives the voice message, and helps the user to quickly obtain the specific content related to the voice message, and timely and accurately obtain unknown information.
Here, in step S305, the method for the searching apparatus 1 to obtain the corresponding topic information includes, for example, 1) for a single document, adopting an automatic summarization technology, for example, an understanding-based automatic summarization technology or a structure-based automatic summarization technology to summarize and extract the obtained search result, so as to obtain concise and coherent topic information; 2) for a plurality of documents, the search results with similar contents can be grouped and classified by a clustering method, and then topic information is extracted by adopting an abstract technology, wherein the clustering method needs to calculate the characteristic value of the document corresponding to each search result and then aggregate the characteristic values into a certain category. The feature parameters and the seed document feature values required for calculating the feature values of each document are trained through a training model in advance.
The searching device 1 provides a reference voice feature vector presetting function for a user, the user presets a reference voice feature vector through the function, then judges whether the voice feature vector of the received voice information is matched with the reference voice feature vector, takes the judgment result as a condition for judging whether the searching function is started, carries out thematic organization on the searched information, obtains thematic information and presents the thematic information to the user, and realizes continuous and active receiving of the voice information in communication or in a complex voice environment, autonomously judges the voice information, understands and searches the information related to the voice information, and feeds back related information or an abstract to the user in real time, so that the user can timely and quickly obtain unknown information, and the problem of information asymmetry in communication is solved. For example, 1) when receiving a consultation service, setting a reference voice feature vector as a reference voice feature vector of the voice information of a user, wherein the searching device 1 can continuously and actively receive the voice information, if a sender of the received voice information is the user, the voice information is not further processed, if the sender is other people, a theme domain corresponding to the voice information is generated, searching is performed according to the theme domain, information such as background theme information, specific knowledge points or technical frameworks and the like corresponding to the received voice information is acquired for the user, and an accurate and comprehensive professional answer is timely provided for the user; 2) assuming that a user is receiving medical services, the searching device 1 provides relevant explanations and principles of the content of doctors to the user in real time, so that the user can continuously ask questions on the basis to obtain desired information, and the information asymmetry between doctors and patients is relieved to a certain extent; 3) during negotiation business, for the questions or concepts provided by clients, the users can answer the questions more professionally and comprehensively through the searching device 1, and the development of next-step cooperation is facilitated; 4) in daily life, the user can talk with other people by using the information provided by the searching device 1 in real time, which can relate to multiple fields and multiple subjects, and the effectiveness and the expansibility of the communication between the two parties are improved. Further, the search device 1 will promote the information popularity to a new level, and will provide a new acquisition mode for those who do not know how to acquire information.
Preferably, in step S303, the search apparatus 1 is configured to: 1) if the judgment result is not matched, the voice information is translated into a natural language text based on the recognition of the voice information; 2) and performing natural semantic analysis on the natural language text to generate a theme domain corresponding to the voice information. Specifically, if the voice feature vector of the received voice information does not match the reference voice feature vector preset by the user, that is, the sender of the received voice information is not a predetermined user, in step S303, the search apparatus 1 performs analog-to-digital conversion on the voice information based on the recognition of the voice information to obtain an audio interval of the voice information, extracts the feature quantity of the audio data of the audio interval, and recognizes the voice information as a corresponding natural language text based on the feature vector; then, in step S303, the search apparatus 1 performs natural semantic parsing on the natural language text to obtain a corresponding parsing result, and generates a topic domain corresponding to the voice information according to the parsing result.
Here, in step S303, the search apparatus 1 obtains the keywords in the voice information more accurately by using a natural semantic analysis technique, and expands or deletes the keywords according to the semantics of the voice information, so as to improve the accuracy and hit rate of the search result and improve the user experience.
Preferably, the topic domain includes an information domain, an information intention and an information intention object, wherein in step S304, the search apparatus 1 determines a corresponding topic template according to the information domain and the information intention of the topic domain; generating an extension keyword list by combining the information intention object according to the theme template, and filling the theme domain and the extension keyword list into the theme template; and mapping and expanding the filled theme template into a corresponding search structure.
Specifically, when the subject domain structure of the subject domain includes: when the information field, the information intention and the information intention object are used, after the text content converted from the voice information is filled into the subject field, the information intention and the information intention object of the subject field have corresponding keywords, in step S304, the searching device 1 obtains a corresponding subject template in the subject template library through pattern matching according to the information field and the information intention in the subject field, and the information intention object (main keyword) combines with a preset rule in the subject template to generate an extension keyword list, and respectively fills the content of the subject field and the extension keyword list into corresponding positions of the subject template, wherein the extension keyword can be an addition keyword or a subtraction keyword, and fills the extension keyword into corresponding positions of the subject template to make the contents of the subject template more complete, the hit rate of the search is improved. The information intention object (main keyword) and the extension keywords jointly form a topic template and all search keywords of the corresponding search structure. The topic template is preset with a search mode, the search mode is, for example, a search instruction, a specific search structure preset by each search instruction (where the specific search structure is a search structure of the topic template and is different from a search structure corresponding to a search engine), and the like, and the content of the topic template is mapped and expanded into some attributes of the search structure corresponding to the preset search engine. The topic template library is a database which is preset and used for storing topic templates, for example, a large number of voice information samples are collected and analyzed to obtain a large number of topic templates, and the topic templates are stored in the topic template library.
The expanded keywords enrich the search keywords, so that the results desired by the user can be obtained more easily, and meanwhile, noise information in the search keywords can be reduced, unnecessary searches are reduced, and the keywords are more accurate. The method comprises the steps of obtaining a corresponding theme template according to a theme domain, filling the theme domain content and the extension expansion keywords generated corresponding to the theme template into the theme template, and directly mapping the theme template into a search structure corresponding to a search engine, so that the accuracy of the search keywords is improved, the pressure of the search engine is reduced, the hit rate of the search is improved, more accurate matching results are presented for a user, and the capability of the user for quickly obtaining information in real time is improved.
For example, in step S301, following the above example, the speech information recognized by the search apparatus 1 is: "i should return to Qingdao recently without knowing that a train ticket that can not be ordered to Beijing to Qingdao at present is available", in step S304, the search apparatus 1 obtains a corresponding topic template in the topic template library by pattern matching according to the generated information field (train) and information intention (reservation) in the topic domain, correspondingly fills the content of the topic domain into the topic template, and the information intention object (main keyword: Beijing, Qingdao) is combined with the rules preset in the topic template, for example, the date limit of presetting various tickets in the topic template is within one week, and the topic template internally excludes transportation modes such as airplane, coach, ship and the like according to the information field of the train, thus generating an extended keyword list: date: within one week; the traffic mode is as follows: -airplane, -bus, -ship, wherein "within one week" is an increased keyword, and "airplane, bus, ship" is a reduced keyword, and correspondingly filling the subject domain content and the extensional expansion keyword into the obtained subject template:
topic template name: train & query
And (3) search instructions:
the starting place: { Beijing };
destination: { Qingdao };
date: { optional, if not set, optional last week }
Priority classes: { optional }
Excluded traffic modes: { airplane, bus, boat }
If the search apparatus 1 performs a search using an hundred-degree search engine, the content of the topic template is mapped to a search structure corresponding to the hundred-degree search engine, for example, the content of the topic template is mapped to the search structure shown in fig. 2, and an active matching search is performed to obtain corresponding subject information and present the subject information to the user. FIG. 2 illustrates a framework for a search structure when using a hundred degree search engine, according to one embodiment of the invention.
Preferably, in step S304, the search apparatus 1 expands the topic domain map into the search structure according to the attribute of the search engine corresponding to the active matching search. Specifically, in step S304, the search apparatus 1 constructs a search instruction according to the information field, the information intent object, or any combination of the three of the subject field, and then generates a corresponding search structure based on the difference of the search engine attributes corresponding to the active matching search. For example, when a user uses a hundred-degree search engine to search, a search structure frame corresponding to a subject field generated according to the recognition of the received user voice information is as shown in fig. 2, and according to the received voice information, a keyword in a search structure corresponding to the voice information is automatically embedded in the input bar of fig. 2 to perform a search operation.
Preferably, in step S305, the search apparatus 1 performs active matching search according to the search structure corresponding to the topic domain and by combining cognitive computation, and obtains and presents corresponding topic information to the user. Specifically, in step S304, the search apparatus 1 matches the information meeting the search structure according to the search structure corresponding to the topic domain, combines with cognitive computation, obtains the topic information of the information, and then presents the information to the user, for example, learns and analyzes the keyword information in the search structure, senses the user' S requirement according to the keyword information, obtains information with high relevance for the user, performs topic organization on the search result, obtains corresponding topic information, and presents the corresponding topic information to the user, thereby improving the intelligence of the search apparatus 1 in performing automatic matching search. The cognitive calculation is used for calculating unstructured information by accumulating simple unit calculation levels, so that the simulation of human brain thinking mode and cognition is realized.
Preferably, step S305 includes: sub-step 3051 (not shown), sub-step 3052 (not shown), and sub-step 3053 (not shown).
In the substep S3051, the searching apparatus 1 performs active matching search according to the search structure corresponding to the topic domain, and obtains corresponding candidate information. Specifically, in sub-step S3051, the searching apparatus 1 performs automatic matching search according to the search structure to obtain a large number of related search results, and in step S305, the display interface of the searching apparatus 1 is limited, and the time for the user to obtain information is also limited, so that the large number of related search results are required to be used as candidate information, and the candidate information is presented to the user after being refined.
In sub-step S3052, the search apparatus 1 generates the topic information based on information extraction and integration of the candidate information. Specifically, a resource and rule base are preset, in sub-step S3052, the search apparatus 1 determines a topic according to the search structure, extracts a description vector related to the topic from the information of the search result through the resource and rule stored in the resource and rule base, and generates global information related to the topic as topic information based on the description vector, so that the user can quickly obtain related knowledge through the global information. For example, in the case where the user inquires about "urticaria", there are a large number of search results after the search is performed by the search apparatus 1, all the results cannot be presented to the user at one time due to the limitation of the search apparatus 1, and in order for the user to quickly acquire the main knowledge, in sub-step S3052, the search apparatus 1 summarizes urticaria as the following subject information: a wind cluster; a wheal mass; skin diseases; sudden appearance of a red mass on the local or systemic skin; the disease onset is fast, and the disappearance is fast; acute itching; allergy. The user can generally know the urticaria disease after seeing the global information.
In sub-step S3053, the search apparatus 1 presents said topic information to said user. Specifically, the presentation manner for presenting the subject information to the user includes, but is not limited to: rendered by an audio device; presenting through a video device; presented through a wearable device. The audio equipment is, for example, an intelligent headset, an intelligent earphone and the like; the video device is, for example, Virtual Reality (VR) glasses, smart glasses capable of freely switching between a general glasses function and a projection function, and the like; wearable equipment includes intelligent wrist-watch, intelligent helmet, intelligent dress etc..
It is noted that the present invention may be implemented in software and/or in a combination of software and hardware, for example, the various means of the invention may be implemented using Application Specific Integrated Circuits (ASICs) or any other similar hardware devices. In one embodiment, the software program of the present invention may be executed by a processor to implement the steps or functions described above. Also, the software programs (including associated data structures) of the present invention can be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Further, some of the steps or functions of the present invention may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (13)

1. An active type search method based on speech recognition, wherein the method comprises the following steps:
a. recognizing the received voice information and obtaining a voice characteristic vector of the voice information;
b. judging whether the voice feature vector is matched with a reference voice feature vector preset by a user or not, and obtaining a corresponding judgment result;
c. if the judgment result is not matched, generating a theme domain corresponding to the voice information based on the recognition of the voice information;
d. expanding the topic domain map into a corresponding search structure;
e. performing active matching search according to the search structure, acquiring corresponding subject information and presenting the subject information to the user;
wherein the step e comprises:
performing active matching search according to the search structure to obtain corresponding candidate information;
generating the subject information based on information extraction and integration of the candidate information;
and presenting the theme information to the user.
2. The method of claim 1, wherein the step c comprises:
if the judgment result is not matched, the voice information is translated into a natural language text based on the recognition of the voice information;
and performing natural semantic analysis on the natural language text to generate a theme domain corresponding to the voice information.
3. The method of claim 1 or 2, wherein the subject domain comprises at least any one of:
a field of information;
an information intent;
an information intent object.
4. The method of claim 3, wherein the subject domain comprises an information domain, an information intent, and an information intent object, wherein step d comprises:
determining a corresponding theme template according to the information field and the information intention of the theme domain;
generating an extension keyword list by combining the information intention object according to the theme template, and filling the theme domain and the extension keyword list into the theme template;
and mapping and expanding the filled theme template into a corresponding search structure.
5. The method according to any one of claims 1 to 4, wherein said step d comprises:
and expanding the subject domain mapping into the search structure according to the attribute of the search engine corresponding to the active matching search.
6. The method according to any one of claims 1 to 5, wherein step e comprises:
according to the search structure and in combination with cognitive calculation, active matching search is carried out to obtain corresponding candidate information;
generating the subject information based on information extraction and integration of the candidate information;
and presenting the theme information to the user.
7. An active type search apparatus based on voice recognition, wherein the search apparatus comprises:
the recognition device is used for recognizing the received voice information and obtaining a voice feature vector of the voice information;
the judging device is used for judging whether the voice feature vector is matched with a reference voice feature vector preset by a user or not and obtaining a corresponding judgment result;
generating means, configured to generate a theme zone corresponding to the voice information based on the recognition of the voice information if the determination result is not matched;
mapping means for expanding the topic domain map into a corresponding search structure;
the presentation device is used for carrying out active matching search according to the search structure, acquiring corresponding subject information and presenting the subject information to the user;
wherein the presentation device comprises:
the acquisition unit is used for carrying out active matching search according to the search structure to acquire corresponding candidate information;
the integration unit is used for extracting integration based on the information of the candidate information and generating the subject information;
and the presentation unit is used for presenting the theme information to the user.
8. The search apparatus of claim 7, wherein the generating means is configured to:
if the judgment result is not matched, the voice information is translated into a natural language text based on the recognition of the voice information;
and performing natural semantic analysis on the natural language text to generate a theme domain corresponding to the voice information.
9. The search apparatus according to claim 7 or 8, wherein the topic domain comprises at least any one of:
a field of information;
an information intent;
an information intent object.
10. The search apparatus of claim 9, wherein the topic domain comprises an information domain, an information intent, and an information intent object, wherein the mapping means is for:
determining a corresponding theme template according to the information field and the information intention of the theme domain;
generating an extension keyword list by combining the information intention object according to the theme template, and filling the theme domain and the extension keyword list into the theme template;
and mapping and expanding the filled theme template into a corresponding search structure.
11. The search apparatus according to any one of claims 7 to 10, wherein the mapping apparatus is configured to:
and expanding the subject domain mapping into the search structure according to the attribute of the search engine corresponding to the active matching search.
12. The search apparatus according to any one of claims 7 to 11, wherein the presentation apparatus is specifically configured to:
according to the search structure and in combination with cognitive calculation, active matching search is carried out to obtain corresponding candidate information;
generating the subject information based on information extraction and integration of the candidate information;
and presenting the theme information to the user.
13. A computer device, the computer device comprising:
one or more processors;
a memory for storing one or more computer programs;
the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-6.
CN201611091688.4A 2016-12-01 2016-12-01 Active type searching method and device based on voice recognition Active CN108132952B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201611091688.4A CN108132952B (en) 2016-12-01 2016-12-01 Active type searching method and device based on voice recognition
PCT/CN2017/076968 WO2018098932A1 (en) 2016-12-01 2017-03-16 Proactive searching method and device based on speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611091688.4A CN108132952B (en) 2016-12-01 2016-12-01 Active type searching method and device based on voice recognition

Publications (2)

Publication Number Publication Date
CN108132952A CN108132952A (en) 2018-06-08
CN108132952B true CN108132952B (en) 2022-03-15

Family

ID=62241092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611091688.4A Active CN108132952B (en) 2016-12-01 2016-12-01 Active type searching method and device based on voice recognition

Country Status (2)

Country Link
CN (1) CN108132952B (en)
WO (1) WO2018098932A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109343806B (en) * 2018-08-09 2023-07-21 维沃移动通信有限公司 Information display method and terminal
CN109065055B (en) * 2018-09-13 2020-12-11 三星电子(中国)研发中心 Method, storage medium, and apparatus for generating AR content based on sound
CN111178081B (en) * 2018-11-09 2023-07-21 中移(杭州)信息技术有限公司 Semantic recognition method, server, electronic device and computer storage medium
CN111291168A (en) * 2018-12-07 2020-06-16 北大方正集团有限公司 Book retrieval method and device and readable storage medium
CN110060681A (en) * 2019-04-26 2019-07-26 广东昇辉电子控股有限公司 The control method of intelligent gateway with intelligent sound identification function
CN110853615B (en) * 2019-11-13 2022-05-27 北京欧珀通信有限公司 Data processing method, device and storage medium
CN111105796A (en) * 2019-12-18 2020-05-05 杭州智芯科微电子科技有限公司 Wireless earphone control device and control method, and voice control setting method and system
CN111343022A (en) * 2020-02-28 2020-06-26 上海万得维进出口有限公司 Method and system for realizing network configuration processing of intelligent equipment by directly interacting with user
CN112562652B (en) * 2020-12-02 2024-01-19 湖南翰坤实业有限公司 Voice processing method and system based on Untiy engine
CN112800782B (en) * 2021-01-29 2023-10-03 中国科学院自动化研究所 Voice translation method, system and equipment integrating text semantic features

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467541A (en) * 2010-11-11 2012-05-23 腾讯科技(深圳)有限公司 Situational searching method and system
CN102497391A (en) * 2011-11-21 2012-06-13 宇龙计算机通信科技(深圳)有限公司 Server, mobile terminal and prompt method
CN102880645A (en) * 2012-08-24 2013-01-16 上海云叟网络科技有限公司 Semantic intelligent search method
CN104836720A (en) * 2014-02-12 2015-08-12 北京三星通信技术研究有限公司 Method for performing information recommendation in interactive communication, and device
CN105095406A (en) * 2015-07-09 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus for voice search based on user feature
CN105279227A (en) * 2015-09-11 2016-01-27 百度在线网络技术(北京)有限公司 Voice search processing method and device of homonym

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075435B (en) * 2007-04-19 2011-05-18 深圳先进技术研究院 Intelligent chatting system and its realizing method
KR101878488B1 (en) * 2011-12-20 2018-08-20 한국전자통신연구원 Method and Appartus for Providing Contents about Conversation
US9495350B2 (en) * 2012-09-14 2016-11-15 Avaya Inc. System and method for determining expertise through speech analytics
KR101537370B1 (en) * 2013-11-06 2015-07-16 주식회사 시스트란인터내셔널 System for grasping speech meaning of recording audio data based on keyword spotting, and indexing method and method thereof using the system
CN105159568A (en) * 2015-08-31 2015-12-16 百度在线网络技术(北京)有限公司 Music searching method and device in input interface
CN105068661B (en) * 2015-09-07 2018-09-07 百度在线网络技术(北京)有限公司 Man-machine interaction method based on artificial intelligence and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467541A (en) * 2010-11-11 2012-05-23 腾讯科技(深圳)有限公司 Situational searching method and system
CN102497391A (en) * 2011-11-21 2012-06-13 宇龙计算机通信科技(深圳)有限公司 Server, mobile terminal and prompt method
CN102880645A (en) * 2012-08-24 2013-01-16 上海云叟网络科技有限公司 Semantic intelligent search method
CN104836720A (en) * 2014-02-12 2015-08-12 北京三星通信技术研究有限公司 Method for performing information recommendation in interactive communication, and device
CN105095406A (en) * 2015-07-09 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus for voice search based on user feature
CN105279227A (en) * 2015-09-11 2016-01-27 百度在线网络技术(北京)有限公司 Voice search processing method and device of homonym

Also Published As

Publication number Publication date
CN108132952A (en) 2018-06-08
WO2018098932A1 (en) 2018-06-07

Similar Documents

Publication Publication Date Title
CN108132952B (en) Active type searching method and device based on voice recognition
KR102535338B1 (en) Speaker diarization using speaker embedding(s) and trained generative model
US20190079724A1 (en) Intercom-style communication using multiple computing devices
EP3631793B1 (en) Dynamic and/or context-specific hot words to invoke automated assistant
CN111832308B (en) Speech recognition text consistency processing method and device
CN107832720B (en) Information processing method and device based on artificial intelligence
US20230014775A1 (en) Intelligent task completion detection at a computing device
CN111213136A (en) Generation of domain-specific models in networked systems
CN110019824A (en) Man-machine interaction method, the apparatus and system of knowledge based map
KR20190046062A (en) Method and apparatus of dialog scenario database constructing for dialog system
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
US20240055003A1 (en) Automated assistant interaction prediction using fusion of visual and audio input
CN105869631B (en) The method and apparatus of voice prediction
CN111508530B (en) Speech emotion recognition method, device and storage medium
CN106844734B (en) Method for automatically generating session reply content
KR20130068624A (en) Apparatus and method for recognizing speech based on speaker group
CN114461749A (en) Data processing method and device for conversation content, electronic equipment and medium
Nikam et al. Covid-19 Android chatbot using RASA
CN111582708A (en) Medical information detection method, system, electronic device and computer-readable storage medium
JP2020077272A (en) Conversation system and conversation program
JP7169030B1 (en) Program, information processing device, information processing system, information processing method, information processing terminal
CN114036373B (en) Searching method and device, electronic equipment and storage medium
US20230214413A1 (en) Information recommendation system, information search device, information recommendation method, and program
US20220020371A1 (en) Information processing apparatus, information processing system, information processing method, and program
KR20170118465A (en) Integrated system and method for voice analysis and situation deduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant