US11138278B2 - Method for querying long-form speech - Google Patents

Method for querying long-form speech Download PDF

Info

Publication number
US11138278B2
US11138278B2 US16/109,553 US201816109553A US11138278B2 US 11138278 B2 US11138278 B2 US 11138278B2 US 201816109553 A US201816109553 A US 201816109553A US 11138278 B2 US11138278 B2 US 11138278B2
Authority
US
United States
Prior art keywords
matrix
query
transcript
word
literals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US16/109,553
Other languages
English (en)
Other versions
US20200065420A1 (en
Inventor
Anthony Scodary
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gridspace Inc
Gridspace Inc USA
Original Assignee
Gridspace Inc USA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US16/109,553 priority Critical patent/US11138278B2/en
Application filed by Gridspace Inc USA filed Critical Gridspace Inc USA
Assigned to GRIDSPACE INC. reassignment GRIDSPACE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SCODARY, ANTHONY
Priority to PCT/US2019/046700 priority patent/WO2020041098A1/en
Priority to EP19851374.9A priority patent/EP3841488A4/en
Priority to JP2021534120A priority patent/JP7293359B2/ja
Publication of US20200065420A1 publication Critical patent/US20200065420A1/en
Priority to US17/394,800 priority patent/US11880420B2/en
Publication of US11138278B2 publication Critical patent/US11138278B2/en
Application granted granted Critical
Assigned to USAA PROPERTY HOLDINGS, INC. reassignment USAA PROPERTY HOLDINGS, INC. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GRIDSPACE, INC.
Priority to US18/524,697 priority patent/US20240095292A1/en
Assigned to GUAVA, INC. (FORMERLY GRIDSPACE, INC.) reassignment GUAVA, INC. (FORMERLY GRIDSPACE, INC.) RELEASE OF SECURITY INTEREST Assignors: USAA PROPERTY HOLDINGS, INC.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Definitions

  • Search query engines may be utilized to determine whether words or phrases were used in a text document.
  • Conventional search query engines focus on the actual word or phrase that was used instead of the meaning of that word or phrase.
  • those conventional search engines are neither accurate nor efficient. Thus, they are may be of limited use in real-time search query applications, or even overall. Additionally, conventional search query engines do not search speech transcripts that are enriched with emotional metadata for concepts.
  • the search query engine converts a search query into a tree of operations using literals and operators.
  • the query and a transcript may then be converted into a matrix of word embeddings that represent the meaning of the word and the cross-correlation of the two matrices is computed to find matches.
  • the cross-correlation of large transcript matrices may be accelerated by utilizing the Fourier transform of the matrix.
  • Matches are then those dot products that fall with a softness threshold as determined by a softness map.
  • non-speech data e.g., emotions or speaker role
  • FIG. 1 illustrates an embodiment of a communication system 100 .
  • FIG. 2 illustrates an embodiment of a query method 200 .
  • FIG. 3 illustrates an embodiment of a tree of operations generation method 300 .
  • FIG. 4 illustrates an embodiment of a query types 400 .
  • FIG. 5 illustrates an embodiment of a query tree 500 .
  • FIG. 6 illustrates an embodiment of a word embedding method 600 .
  • FIG. 7 illustrates an embodiment of a fast Fourier transformation system 700 .
  • FIG. 8 illustrates an embodiment of a fast Fourier transformation method 800 .
  • FIG. 9 illustrates an embodiment of a match combination method 900 .
  • FIG. 10 illustrates an embodiment of a communication system 1000 .
  • FIG. 11 illustrates an embodiment of a query method 1100 .
  • FIG. 12 illustrates in an embodiment of a sparse quantitative thesaurus matrix generation method 1200 .
  • FIG. 13 illustrates a thesaurus 1300 in accordance with one embodiment.
  • FIG. 14 is an example block diagram of a computing device 1400 that may incorporate embodiments of the present invention.
  • a data processing device may be executed by a data processing device to return results much faster from unstructured or lightly structured data sources such as data files that are machine-generated speech-to-text transcripts of multi-participant voice conferences.
  • the new algorithms utilize a combination of processing that's particularly efficient for execution on text-to-speech converted transcript files, using the instruction set architecture of modern data processing integrated circuits such as central processing units (CPUs) and graphics processing units (CPUs).
  • CPUs central processing units
  • CPUs graphics processing units
  • a communication system 100 comprises a first person 102 , a second person 104 , a network 106 , an audio transformation system 108 , a speech to text converter 110 , an analog to digital converter 112 , an enrichment logic 114 , a digital transcript 116 , a third person 118 , a query engine 120 , a query parser 122 , a matrix generator 124 , a query word embedding matrix 126 , a transcript word embedding matrix 128 , a cross-correlator 130 , a comparator 132 , a softness map 134 , and a combiner 136 .
  • the first person 102 is in audio communication with a second person 104 over a network 106 , for example an IP network, analog telephone network, or cellular network.
  • a network 106 for example an IP network, analog telephone network, or cellular network.
  • Audio from the communications may be recorded, or streamed live to an audio transformation system 108 , which converts the audio to metadata-enriched text.
  • the audio transformation system 108 may comprise a speech to text converter 110 and enrichment logic 114 to transform the audio into the enriched text. If the audio is in an analog format, the audio transformation system 108 may utilize an analog to digital converter 112 to convert to a digital format before providing the digital audio to the speech to text converter 110 .
  • the enriched text of the audio is output in the form of one or more digital files of a digital transcript 116 .
  • a third person 118 may search the digital transcript 116 using queries.
  • the queries, along with the digital transcript 116 are operated on by a query engine 120 .
  • the query engine 120 may be operated according to the process depicted in FIG. 2 .
  • the query engine 120 inputs the query to a query parser 122 to generate a tree of operations from words (literals) and operators of the query.
  • the query parser 122 may generate the tree of operation in accordance with the process depicted in FIG. 3 .
  • the query parser 122 may further utilize the query types 400 to parse the query into the tree of operations.
  • the literals and the digital transcript 116 are input to a matrix generator 124 to generate a query word embedding matrix 126 and a transcript word embedding matrix 128 .
  • the matrices may be generated in accordance with the process depicted in FIG. 6 .
  • a cross-correlator 130 to generate dot product pairs, which are input to a comparator 132 .
  • a fast Fourier transformation system may be utilized to generate the dot products. An embodiment of this system is depicted in FIG. 7 .
  • the comparator 132 identifies matches from dot products that fall with a softness threshold as determined by a softness map 134 .
  • the matches are combined (combiner 136 ) based on the operators extracted from the query by the query parser 122 .
  • the combiner 136 may be operated in accordance with the process depicted in FIG. 9 .
  • the combiner 136 generates an output.
  • the combiner 136 may limit the number of outputs to a number of highest results, all results, or no results if the final weight is too low.
  • a query method 200 receives a transcript (block 202 ).
  • the transcript may be the digital transcript 116 discussed in FIG. 1 .
  • a query is then received (block 204 ).
  • the query may comprise the query types discussed in reference to FIG. 4 .
  • the query is then transformed into a tree of operations comprising literals and operators (block 206 ).
  • the tree of operation comprises operators as the stems and literals as the leaves.
  • the operators may be unary or binary, that is, one or two connections, respectively, to a lower level on the tree of operations.
  • the tree of operations may be generated by the process depicted in FIG. 3 . Other tree generating algorithms may be utilized.
  • the literals and the transcript are each transformed into matrix of word embeddings (block 208 ).
  • the word embeddings may be stored in a control memory structure.
  • the word embeddings may be multi-dimensional, such as 50-1000 dimensions. 300-dimension word embedding may be utilized to optimize efficiency and accuracy.
  • the word embedding may be generated in accordance with the process depicted in FIG. 6 .
  • the dimension of embeddings may be expanded to store other non-speech information. For example, the speaker role “agent” vs “caller”) or the emotional content (0.0-1.0 based on how angry the speaker was) may be included.
  • the dimension of the transcript embeddings may be extended to include the model outputs and the query embedding can be extended to include the query flag (0 vs 1 or similar).
  • the dimension of the transcript embeddings may be extended to include the model outputs and the query embedding can be extended to include the query flag (0 vs 1 or similar).
  • an additional “301st” or more dimension is included to represent the non-speech information.
  • this same method may be utilized, or a search index may be utilized to filter down transcript segments that match that metadata flag.
  • Each literal may have its own query matrix.
  • C is the cross-correlation
  • T is the transcript matrix
  • Q is the query matrix
  • l is the length of the transcript matrix, which is determined based on the number of words in the transcript.
  • the cross-correlation is determined utilizing the Fourier transform of the matrices and the convolution Theorem.
  • An exemplary system is depicted in FIG. 7 and the process is depicted in FIG. 8 .
  • a threshold matrix size may be utilized to determine whether the fast Fourier transformation system 700 is utilized.
  • a cross-correlation may be determined for each literal. The cross-correlation is then compared to a softness map to determine matches (block 212 ).
  • the softness map may be based on the degree of softness for the given literal(s).
  • the softness map returns thresholds for each literal as each literal may have a different softness.
  • the cross-correlation is compared to the threshold from the softness matrix to determine matches. Those cross-correlations for each literal that are exceed the threshold are determined to be matches.
  • the matches and operators are utilized to execute the tree of operations to return the output (block 214 ). This may be performed in accordance with the process depicted in FIG. 9 .
  • the query method 200 may also utilize a shunting-yard algorithm to determine the output. Each operator may have composition rules stored to determine the effect on the matches.
  • the matches for each literal may replace the literal in the tree of operations and multiple permutations of the tree of operations performed if multiple matches are determined for a literal.
  • a further threshold may be performed prior to being output to eliminate those outputs with a low weight.
  • the output may comprise the location of the match, the weight of the match, the query, the match, any extractions, etc.
  • a query may be performed on a phrase. While the cross correlation behaves well on longer phrases, word ordering affects meaning. As such, being out of order may be penalized while permitting some word reordering.
  • One method is to convolve the transcript embedding matrix with a kernel (e.g., a Gaussian kernel) in a soft query. This blurs the location of words by a few places, allowing word reordering to be tolerated to some degree.
  • K is the Kernel
  • l is the length of the matrix to be blurred.
  • K [0.05, 0.1, 0.7, 0.1, 0.05].
  • a tree of operations generation method 300 receives a query (block 302 ). The presence of compound query indicator(s) is determined (block 304 ). If the tree of operations generation method 300 determines (decision block 306 ) that an indicator is not present, the tree of operations generation method 300 determine whether an operator is present (decision block 308 ). If not, the literal is determined (block 310 ). Any modifiers to the literal, such as the softness are also determined. If an operator is determined to be present that operator is determine (block 312 ). The operator is then sent to block 318 .
  • the innermost indicator is initialized (block 314 ).
  • the indicator may be a set of parentheses. Mathematical operations may be utilized to determine which indicator is the innermost. If two indicators may both be considered innermost, one is selected. One such scheme is to select the indicator that is first from left to right.
  • the innermost operator is then determined and set as the current operator (block 316 ).
  • a counter is set to “1” (block 318 ).
  • the counter may generally be initialized to any number or other value in other embodiments.
  • the current operator is placed at level “counter+1” (block 320 ).
  • the literal(s) are determined for the current operator (block 322 ). Those literals are placed at level “counter” and connected to current operator (block 324 ).
  • the tree of operations generation method 300 determines whether there is another indicator or operator (decision block 326 ). If so, the current operator is stored as a “literal” for the next connected operator at a higher level (block 328 ). The next indicator is determined (block 330 ). In cases where another operated is detected but no indicator is determined, the tree of operations generation method 300 may treat that operator as being in an indicator. The counter is incremented if the next indicator is at a higher level (block 332 ). The next operator is determined (block 334 ). The next operator is set as the current operator (block 336 ). The tree of operations generation method 300 then begins from block 320 . Once only a literal is determined or there are no additional operators or indicators, the tree of operations generation method 300 ends (done block 338 ).
  • query types 400 that may be stored in a query control memory structure 402 are depicted.
  • the query types 400 may comprise literals 404 , phrase operators 406 , conversation operators 408 , segment modifiers 410 , compound queries 412 , extractors 414 , time operators 416 , and metadata 418 .
  • the above does not constitute an exhaustive list of the query types 400 .
  • the literals 404 are extracted from a query and compared to the transcript.
  • the literals 404 may be indicated by quotations around a word or phrase.
  • the literals 404 may be “crash”, “lost credit card”, etc. Single quotes may be utilized as well in some embodiments, such as ‘crash’.
  • other indicators for the literals 404 may be utilized. The indicators are utilized to determine which text is to be compared to the transcript.
  • the literals 404 have an associated softness.
  • the literals 404 may have a default softness of 0.
  • this softness may be increased by a softness indicator, such as one to more tildes ( ⁇ ) added before the quoted word or phrase to “loosen up” similar matches (semantically, meaning similar in meaning not sound).
  • a softness indicator such as one to more tildes ( ⁇ ) added before the quoted word or phrase to “loosen up” similar matches (semantically, meaning similar in meaning not sound).
  • one tilde matches similar forms like plurals or conjugates. For example, ⁇ “crash” matches “crashes” or “crashing”. Two tildes match synonymous words. For example, ⁇ ⁇ “crash” matches “accident” or “collision”. Three tildes match related phrasings. For example, ⁇ ⁇ ⁇ “have a nice day” matches “i hope your day is great”.
  • the softness associated with the literals 404 may be utilized to determine a threshold value for potential matches and incorporated into a softness map.
  • phrase operators 406 are utilized to search within a speech segment for two things (e.g., the literals 404 ).
  • Exemplary phrase operators 406 include “near”, “or”, or “then”. For example, a query for ⁇ ⁇ “crash” near “honda”, looks for both ⁇ ⁇ “crash” and “honda”. The query ⁇ ⁇ “crash” or “ticket” looks for either ⁇ ⁇ “crash” or “ticket” or both. The query ⁇ ⁇ “crash” then “police report” looks for both ⁇ ⁇ “crash” and “police report” in order.
  • phrases 406 are placed within a tree of operations and utilized to combine the matches of the literals 404 , if any.
  • the conversation operators 408 are utilized to search across an entire conversation for two things.
  • Exemplary conversation operators 408 include “and”, “or”, and “later”.
  • the “and” operator looks for a conversation that contains both literals. They query ⁇ ⁇ “lost card” and “two weeks” may match a conversation that looks like this:
  • the “near” operator may not match, because they span different speech segments.
  • the “or” operator looks for a conversation that contains either literals or both. Its use is determined by context relative to the phrase scanner.
  • the query caller ⁇ ⁇ “lost card” or caller “two weeks” may match the following conversation:
  • the “later” operator looks for a conversation that contains both literals in order. For example, the query ⁇ ⁇ ⁇ “reset my password” later ⁇ “thanks” may match the following conversation:
  • the segment modifiers 410 are additional modifiers that may be placed to the left of a segment to restrict it to a certain property or modify it in some other way.
  • Exemplary segment modifiers 410 include “agent”, “caller”, and “not”.
  • the “agent” segment modifier applies if an agent says the following phrase.
  • An example query is agent ⁇ ⁇ “great to hear”.
  • the “caller” segment modifier applies if a caller says the following phrase.
  • An example query is caller ⁇ ⁇ “very helpful”.
  • the “not” segment modifier applies if the following phrase does not occur.
  • An exemplary query is not ⁇ ⁇ “claim”.
  • the segment modifiers 410 may be stacked (although order can affect meaning), such as not agent ⁇ ⁇ “sorry” matches a conversation in which an agent does not apologize.
  • the compound queries 412 are utilized to build more complex queries.
  • the compound queries 412 may be indicated by the utilization of parentheses in one embodiment. Other embodiments may utilize symbols to indicate the compound queries 412 .
  • Inner scanners are evaluated and then combined with outer scanners. An example is ( ⁇ ⁇ “crash” near ⁇ ⁇ “police report”) or ⁇ ⁇ ⁇ “file a claim”. This phrase matches if a crash and police report are both mentioned or if a claim is filed (or both). However, “police report” alone would not match.
  • the compound queries 412 may be done multiple times, such as (((( ⁇ ⁇ “crash” near ⁇ ⁇ “police report”) or ⁇ ⁇ ⁇ “file a claim”) later agent ⁇ ⁇ “sorry”) and caller not ⁇ ⁇ “thank you”) or “thank you for your help with the claim”.
  • the extractors 414 are special phrases that may be indicated by curly braces “ ⁇ ⁇ ” that represent a concept. In some embodiments, the extractors 414 are treated as if they have two tildes and thus can be omitted.
  • the query ⁇ ⁇ “hello my name is ⁇ name ⁇ ” may match “hi my name is George”.
  • the time operators 416 place time constraints on scanners.
  • a maximum duration, or less than an amount of time has passed, may be specified by utilizing an indicator, such as square brackets as well as the less than operator, a number, and units, such as [ ⁇ 30 s] is less than 30 seconds, [ ⁇ 5 s] is less than five seconds, and [ ⁇ 5 m] is less than five minutes.
  • the query “interest rate” [ ⁇ 30 s] “a. p. r.” looks for the phrase “a. p. r.” less than thirty seconds after “interest rate”.
  • a minimum duration is similar to the maximum duration but requires that there be more than the specified amount of time between phrases.
  • Start and end tokens are time operators 416 that may be utilized to specify the start and end of the call. For example, ⁇ start ⁇ [ ⁇ 30 s] “thanks for calling” looks for “thanks for calling” being said in the first thirty seconds. Similarly, ⁇ end ⁇ can indicate the end of the call. The query “anything else today” [>1 m] ⁇ end ⁇ may enforce that “anything else today” was said greater than a minute before the end of the call.
  • the metadata 418 may be utilized to place constraints on call metadata, such as the date, start time, duration, or user-provided metadata.
  • the metadata queries may be performed first, and then scanner is performed on the resulting subset.
  • a query tree 500 comprises a first literal 502 , a first operator 504 , a compound query 506 , a second literal 508 , a second operator 510 , and a third literal 512 .
  • the query tree 500 is generated from the query: ⁇ ⁇ “lost” and ( ⁇ ⁇ “debit” then “card”). The query is then compared to the transcript: “i think i have misplaced my credit card”.
  • the second operator 510 is determined to be the operator within the compound query 506 and is placed within the second level of the query tree 500 .
  • the literals for the second operator 510 , the second literal 508 and the third literal 512 are determined and place in the first level of the query tree 500 , connected to the second operator 510 .
  • the word or phrase of the literal and the associated softness is determined, which will then be utilized to compare to the transcript.
  • the next operator, the first operator 504 is then determined and placed in the third level of the query tree 500 .
  • the connectors are then determined for the first operator 504 , which are the first literal 502 and the second operator 510 .
  • the first literal 502 also has its word or phrase and associated softness determined to be utilized to compare to the transcript.
  • a word embedding method 600 determines a number of words for the query or transcript (block 602 ).
  • the word embedding method 600 may be performed on both the query and the transcript.
  • the query or transcript vector is generated with a length equal to the number of words (block 604 ).
  • the first word is selected (block 606 ) and set as the current word (block 608 ).
  • the embedding vector for current word is then determined (block 610 ).
  • the embedding vector may be pre-determined and stored to be retrieved.
  • the embedding vector may be between 50 and 1000 dimensions in some embodiments.
  • the embedding vector is placed into query or transcript vector (block 612 ).
  • the embedding vector replaces the word in the query or transcript vector.
  • the word embedding method 600 determines whether there is another word (decision block 614 ). If so, the next word is selected (block 616 ) and the word embedding method 600 is performed from block 608 . Once the words are replaced by their word embeddings, the word embedding method 600 ends.
  • a fast Fourier transformation system 700 comprises a query word embedding matrix 702 , a transcript word embedding matrix 704 , a Fourier fast transformer 706 , a cross-correlator 708 , and an inverse Fourier fast transformer 710 .
  • the query word embedding matrix 702 and the transcript word embedding matrix 704 may be received from a matrix generator.
  • the Fourier fast transformer 706 performs a Fourier transformation on the query word embedding matrix 702 and transcript word embedding matrix 704 to accelerate the performance of the cross-correlator 708 when generating the dot products for comparison.
  • the cross-correlator 708 may perform point-wise multiplication and send the results to the inverse Fourier fast transformer 710 .
  • the output of the cross-correlator 708 may then be reverse transformed by the inverse Fourier fast transformer 710 using an inverse Fourier transform.
  • the fast Fourier transformation system 700 may be operated in accordance with the process depicted in FIG. 8 .
  • the fast Fourier transformation system 700 may be the default or an alternate system to perform the cross-correlation.
  • a threshold may be utilized, based on factors, such as matrix size, to determine whether to utilize the fast Fourier transformation system 700 .
  • a fast Fourier transformation method 800 receives transcript and query matrices (block 802 ).
  • a Fourier transform is applied on the transcript matrix and the query matrix (block 804 ).
  • a point-wise multiplication is applied between the matrices (block 806 ).
  • An inverse Fourier transform is applied to the point-wise product of the matrices (block 808 ).
  • the resulting “dot products” are then sent to a comparator to determine any matches.
  • a match combination method 900 replaces literals with matches (block 902 ).
  • the literals may be received as part of a tree of operation.
  • the matches may be received from a comparator.
  • the number of levels in tree of operation is determined (block 904 ).
  • the lowest level is selected (block 906 ).
  • a first pair of matches at the level is selected (block 908 ). If multiple pairs are at the same level, one may be selected randomly, or by position (e.g., left-most) to be performed first if performed in series. The pairs may be evaluated in parallel. In cases of a unary operator, the literal for that operator is selected.
  • the “literal” to be operated on is the result of an operator acting on a literal(s), such as for a compound query.
  • the connecting operator is determined (block 910 ).
  • the operation corresponding to the operator is determined (block 912 ).
  • the operation may be stored along with the operator and retrieved to be performed on the literal(s).
  • Exemplary operations include the “and” operator requiring a match in both literals.
  • the new start is the minimum of the two original literal starts.
  • the new end is the maximum of the two ends.
  • the new match is the original two match strings concatenated with and (i.e., “credit” and “card”).
  • the new query is combined in a similar way.
  • the weight is the product of the input weights.
  • a threshold value is applied after the operation is performed to remove the match as a potential output.
  • the match combination method 900 determines whether there is another pair at the level (decision block 918 ). If so, the next pair of matches is selected (block 920 ). As above one literal or reduced operator is selected for a unary operator. The match combination method 900 then is performed on the next pair from block 910 .
  • the match combination method 900 determines if there is another level (decision block 922 ). If so, the next level is selected (block 924 ), and the match combination method 900 is performed on the next level from block 908 . Once all levels have been reduced, an output is generated. The output may include the start, end, weight, query, match, and extractions. Other information may be provided. The output may also be applied to the transcript to, for example, highlight the output. The match combination method 900 then ends (done block 926 ).
  • a communication system 1000 comprises a first person 1002 , a second person 1004 , a network 1006 , an audio transformation system 1008 , a speech to text converter 1010 , an analog to digital converter 1012 , an enrichment logic 1014 , a digital transcript 1016 , a third person 1018 , a query engine 1020 , a query parser 1022 , a search engine 1024 , a quantitative thesaurus matrix 1026 , and a combiner 1028 .
  • the first person 1002 is in audio communication with a second person 1004 over a network 1006 , for example an IP network, analog telephone network, or cellular network.
  • a network 1006 for example an IP network, analog telephone network, or cellular network.
  • Audio from the communications may be recorded, or streamed live to an audio transformation system 1008 , which converts the audio to metadata-enriched text.
  • the audio transformation system 1008 may comprise a speech to text converter 1010 and enrichment logic 1014 to transform the audio into the enriched text. If the audio is in an analog format, the audio transformation system 1008 may utilize an analog to digital converter 1012 to convert to a digital format before providing the digital audio to the speech to text converter 1010 .
  • the enriched text of the audio is output in the form of one or more digital files of a digital transcript 1016 .
  • a third person 1018 may search the digital transcript 1016 using queries.
  • the queries, along with the digital transcript 1016 are operated on by a query engine 1020 .
  • the query engine 1020 may be operated according to the process depicted in FIG. 12 .
  • the query engine 1020 inputs the query to a query parser 1022 to generate a tree of operations from words (literals) and operators of the query.
  • the query parser 1022 may generate the tree of operation in accordance with the process depicted in FIG. 3 .
  • the query parser 1022 may further utilize the query types 400 to parse the query into the tree of operations.
  • the literals and the digital transcript 1016 are input to a search engine 1024 , which then retrieves the matches from the quantitative thesaurus matrix 1026 .
  • the quantitative thesaurus matrix 1026 may be generated based on the process depicted in FIG. 12 .
  • the matches are combined (combiner 1028 ) based on the operators extracted from the query by the query parser 1022 .
  • the combiner 1028 may be operated in accordance with the process depicted in FIG. 9 .
  • the transcript is received (block 1102 ).
  • the query is also received (block 1104 ).
  • the query is transformed into a tree of operations comprising literals and operators (block 1106 ). This step may be performed in accordance with the process depicted in FIG. 3 .
  • the literals and transcript are transformed into vectors of words (block 1108 ).
  • the stored value for each query word-transcript word pair is retrieved (block 1110 ).
  • the value may be stored in a sparse matrix.
  • the sparse matrix may be generated in accordance with the process depicted in FIG. 12 . Multiple thesauruses may be generated for each softness level.
  • the sparse matrix of word similarities may also be utilized to “explode” a query into the similar words.
  • the exploded queries may also have similar composition rules for operators. This enables an approximate version of the scanner algorithm to be run as a pre-process against a traditional search index. For example, if the query is ⁇ ⁇ “lost” it may be exploded to a hard query of “lost”, “misplaced”, “missing”, etc. against a traditional search index. For single-word queries, this is exact. For phrase matches, this is approximate, but by setting the thresholds correctly, this may be a close approximation.
  • the retrieved values are set as matches (block 1112 ).
  • the matches and operators are utilized to execute the tree of operations and return output (block 1114 ). This may be performed in accordance with the process depicted in FIG. 9 .
  • the query method 1100 is utilized to pre-process a transcript comprising multiple documents.
  • the search may be utilized to reduce the number of documents to perform the full scanner matrix operation to a small set of very relevant documents. That is, the transcript may initially include multiple documents.
  • the query method 1100 is applied and those documents with the similar words are kept in the transcript to perform the full scanner operation, such as the process depicted in FIG. 2 .
  • a dot product is performed between two word vectors (block 1202 ).
  • thresholding softness is performed on the dot product (block 1204 ).
  • the result is stored in a sparse matrix (block 1206 ).
  • An exemplary thesaurus is depicted in FIG. 13 .
  • a thesaurus 1300 comprises similarity scores 1302 for the words aardvark, lost, misplaced, and zebra.
  • the similarity scores 1302 may be determined by the process depicted in FIG. 12 .
  • the thesaurus 1300 may be searched.
  • the similarity score may then be utilized along with the similar word(s) to construct another search(es).
  • the similar word(s) may also be utilized to reduce a set of documents with those words. For example, if lost was the query word, misplaced may be selected as a similar word as the similarity score is 0.9. However, aardvark and zebra may not be selected as the similarity score is 0.1.
  • FIG. 14 is an example block diagram of a computing device 1400 (or computing apparatus) that may incorporate embodiments of the present invention.
  • FIG. 14 is merely illustrative of a machine system to carry out aspects of the technical processes described herein, and does not limit the scope of the claims.
  • the computing device 1400 typically includes a monitor or graphical user interface 1402 , a data processing system 1420 , a communication network interface 1412 , input device(s) 1408 , output device(s) 1406 , and the like.
  • the data processing system 1420 may include one or more processor(s) 1404 that communicate with a number of peripheral devices via a bus subsystem 1418 .
  • peripheral devices may include input device(s) 1408 , output device(s) 1406 , communication network interface 1412 , and a storage subsystem, such as a volatile memory 1410 and a nonvolatile memory 1414 .
  • the volatile memory 1410 and/or the nonvolatile memory 1414 may store computer-executable instructions and thus forming logic 1422 that when applied to and executed by the processor(s) 1404 implement embodiments of the processes disclosed herein.
  • the input device(s) 1408 include devices and mechanisms for inputting information to the data processing system 1420 . These may include a keyboard, a keypad, a touch screen incorporated into the monitor or graphical user interface 1402 , audio input devices such as voice recognition systems, microphones, and other types of input devices. In various embodiments, the input device(s) 1408 may be embodied as a computer mouse, a trackball, a track pad, a joystick, wireless remote, drawing tablet, voice command system, eye tracking system, and the like. The input device(s) 1408 typically allow a user to select objects, icons, control areas, text and the like that appear on the monitor or graphical user interface 1402 via a command such as a click of a button or the like.
  • the output device(s) 1406 include devices and mechanisms for outputting information from the data processing system 1420 . These may include the monitor or graphical user interface 1402 , speakers, printers, infrared LEDs, and so on as well understood in the art.
  • the communication network interface 1412 provides an interface to communication networks (e.g., communication network 1416 ) and devices external to the data processing system 1420 .
  • the communication network interface 1412 may serve as an interface for receiving data from and transmitting data to other systems.
  • Embodiments of the communication network interface 1412 may include an Ethernet interface, a modem (telephone, satellite, cable, ISDN), (asynchronous) digital subscriber line (DSL), FireWire, USB, a wireless communication interface such as BlueTooth or WiFi, a near field communication wireless interface, a cellular interface, and the like.
  • the communication network interface 1412 may be coupled to the communication network 1416 via an antenna, a cable, or the like.
  • the communication network interface 1412 may be physically integrated on a circuit board of the data processing system 1420 , or in some cases may be implemented in software or firmware, such as “soft modems”, or the like.
  • the computing device 1400 may include logic that enables communications over a network using protocols such as HTTP, TCP/IP, RTP/RTSP, IPX, UDP and the like.
  • the volatile memory 1410 and the nonvolatile memory 1414 are examples of tangible media configured to store computer readable data and instructions to implement various embodiments of the processes described herein.
  • Other types of tangible media include removable memory (e.g., pluggable USB memory devices, mobile device SIM cards), optical storage media such as CD-ROMS, DVDs, semiconductor memories such as flash memories, non-transitory read-only-memories (ROMS), battery-backed volatile memories, networked storage devices, and the like.
  • the volatile memory 1410 and the nonvolatile memory 1414 may be configured to store the basic programming and data constructs that provide the functionality of the disclosed processes and other embodiments thereof that fall within the scope of the present invention.
  • Logic 1422 that implements embodiments of the present invention may be stored in the volatile memory 1410 and/or the nonvolatile memory 1414 . Said logic 1422 may be read from the volatile memory 1410 and/or nonvolatile memory 1414 and executed by the processor(s) 1404 . The volatile memory 1410 and the nonvolatile memory 1414 may also provide a repository for storing data used by the logic 1422 .
  • the volatile memory 1410 and the nonvolatile memory 1414 may include a number of memories including a main random access memory (RAM) for storage of instructions and data during program execution and a read only memory (ROM) in which read-only non-transitory instructions are stored.
  • the volatile memory 1410 and the nonvolatile memory 1414 may include a file storage subsystem providing persistent (non-volatile) storage for program and data files.
  • the volatile memory 1410 and the nonvolatile memory 1414 may include removable storage systems, such as removable flash memory.
  • the bus subsystem 1418 provides a mechanism for enabling the various components and subsystems of data processing system 1420 communicate with each other as intended. Although the communication network interface 1412 is depicted schematically as a single bus, some embodiments of the bus subsystem 1418 may utilize multiple distinct busses.
  • the computing device 1400 may be a device such as a smartphone, a desktop computer, a laptop computer, a rack-mounted computer system, a computer server, or a tablet computer device. As commonly known in the art, the computing device 1400 may be implemented as a collection of multiple networked computing devices. Further, the computing device 1400 will typically include operating system logic (not illustrated) the types and nature of which are well known in the art.
  • Circuitry in this context refers to electrical circuitry having at least one discrete electrical circuit, electrical circuitry having at least one integrated circuit, electrical circuitry having at least one application specific integrated circuit, circuitry forming a general purpose computing device configured by a computer program (e.g., a general purpose computer configured by a computer program which at least partially carries out processes or devices described herein, or a microprocessor configured by a computer program which at least partially carries out processes or devices described herein), circuitry forming a memory device (e.g., forms of random access memory), or circuitry forming a communications device (e.g., a modem, communications switch, or optical-electrical equipment).
  • a computer program e.g., a general purpose computer configured by a computer program which at least partially carries out processes or devices described herein, or a microprocessor configured by a computer program which at least partially carries out processes or devices described herein
  • circuitry forming a memory device e.g., forms of random access memory
  • “Firmware” in this context refers to software logic embodied as processor-executable instructions stored in read-only memories or media.
  • Hardware in this context refers to logic embodied as analog or digital circuitry.
  • Logic in this context refers to machine memory circuits, non-transitory machine readable media, and/or circuitry which by way of its material and/or material-energy configuration comprises control and/or procedural signals, and/or settings and values (such as resistance, impedance, capacitance, inductance, current/voltage ratings, etc.), that may be applied to influence the operation of a device.
  • Magnetic media, electronic circuits, electrical and optical memory (both volatile and nonvolatile), and firmware are examples of logic.
  • Logic specifically excludes pure signals or software per se (however does not exclude machine memories comprising software and thereby forming configurations of matter).
  • “Software” in this context refers to logic implemented as processor-executable instructions in a machine memory (e.g. read/write volatile or nonvolatile memory or media).
  • “quantitative thesaurus matrix” in this context refers to a matrix of similarity scores with indexes of query word-transcript word pairs.
  • tree of operations in this context refers to a structure depicting the order of operations of operators on the literals and the matches to the literals.
  • “query” in this context refers to a string of symbols that includes at least one literal and may include multiple literals and operators. E.g., “lost” then “card” includes two literals, lost and card, as well as the operator, then.
  • “literal” in this context refers to a word or phrase. E.g., “card”.
  • query word-transcript word pair in this context refers to a pair of words determined by combining one word from the query matrix and one word from the transcript matrix.
  • query word and transcript in this context there are four pairs, [lost, I], [lost, misplaced], [lost, my], and [lost, card].
  • Word embedding in this context refers to a learned representation for text where words that have the same meaning have a similar representation in a compact vector space.
  • a benefit of the dense representations is generalization power: if certain features of how words are used in context provide clues, to their similar meaning, the word embedding representation may reflect these similarities.
  • Word embeddings are a class of techniques where individual words are represented as real-valued vectors in a predefined vector space. Each word is mapped to one vector and the vector values can be learned, for example using a neural network. Each word is represented by a real-valued vector, often tens or hundreds of dimensions. This is contrasted to the thousands or millions of dimensions required for sparse word representations, such as a one-hot encoding.
  • Each word in the vocabulary is represented by a feature vector that encodes different aspects of the word.
  • each word is associated with a point in a vector space.
  • the number of features (and hence the dimensionality of the vector) is much smaller than the size of the vocabulary.
  • the distributed vector representation is learned based on the usage of words. This allows words that are used in similar ways to result in having similar vector representations, naturally capturing their meaning. This can be contrasted with the crisp but fragile representation in a bag of words model where, unless explicitly managed, different words have different representations, regardless of how they are used.
  • the underlying linguistic theory is that words that have similar context will have similar meanings. “You shall know a word by the company it keeps.”
  • softness in this context refers to a degree of relatedness between words. E.g., a softness of 2 may correspond to a synonym.
  • query matrix in this context refers to a vector with a length corresponding to the number of words in a literal and comprising the literal.
  • the query matrix for the query “card” is a 1 ⁇ 1 matrix of [card].
  • the query matrix for the query “today is beautiful” is a 3 ⁇ 1 matrix: [today, is, beautiful].
  • query flag in this context refers to an indicator that a particular non-speech information is to be utilized for a word in a query. E.g., a “1” may indicate utilization and a “0” non-utilization.
  • matches in this context refers to a cross-correlation that exceeds a softness map.
  • softness map in this context refers to a threshold value corresponding to a given softness.
  • a softness 1 may correspond to a softness map of 0.95.
  • non-speech information in this context refers to information regarding the meaning of a word, such as emotion, the speaker, etc. that is not the word itself.
  • cross-correlation in this context refers to a measure of similarity of two series as a function of the displacement of one relative to the other.
  • transcript matrix in this context refers to a vector with a length corresponding to the number of words in a transcript and comprising the words of the transcript.
  • the transcript matrix for the transcript “Hi, my name is Al” is a 5 ⁇ 1 matrix of [Hi, my, name, is, Al].
  • “operator” in this context refers to a symbolic representation of an operation to be performed on one or two literals. E.g., and, then, or, etc.
  • similarity score in this context refers to a measure of the similarity between two word for a softness value.
  • the similarity score for two words may be determined by the cross-correlation of the N-dimensional word vectors of the two words.
  • references to “one embodiment” or “an embodiment” do not necessarily refer to the same embodiment, although they may.
  • the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively, unless expressly limited to a single one or multiple ones.
  • the words “herein,” “above,” “below” and words of similar import when used in this application, refer to this application as a whole and not to any particular portions of this application.
  • association operation may be carried out by an “associator” or “correlator”.
  • switching may be carried out by a “switch”, selection by a “selector”, and so on.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
US16/109,553 2018-08-22 2018-08-22 Method for querying long-form speech Active 2039-02-06 US11138278B2 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US16/109,553 US11138278B2 (en) 2018-08-22 2018-08-22 Method for querying long-form speech
PCT/US2019/046700 WO2020041098A1 (en) 2018-08-22 2019-08-15 Method for querying long-form speech
EP19851374.9A EP3841488A4 (en) 2018-08-22 2019-08-15 METHOD OF REQUESTING LONG-FORM LANGUAGE
JP2021534120A JP7293359B2 (ja) 2018-08-22 2019-08-15 ロングフォーム言語の照会方法
US17/394,800 US11880420B2 (en) 2018-08-22 2021-08-05 Method for querying long-form speech
US18/524,697 US20240095292A1 (en) 2018-08-22 2023-11-30 Method for Querying Long-Form Speech

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US16/109,553 US11138278B2 (en) 2018-08-22 2018-08-22 Method for querying long-form speech

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/394,800 Continuation US11880420B2 (en) 2018-08-22 2021-08-05 Method for querying long-form speech

Publications (2)

Publication Number Publication Date
US20200065420A1 US20200065420A1 (en) 2020-02-27
US11138278B2 true US11138278B2 (en) 2021-10-05

Family

ID=69584610

Family Applications (3)

Application Number Title Priority Date Filing Date
US16/109,553 Active 2039-02-06 US11138278B2 (en) 2018-08-22 2018-08-22 Method for querying long-form speech
US17/394,800 Active US11880420B2 (en) 2018-08-22 2021-08-05 Method for querying long-form speech
US18/524,697 Pending US20240095292A1 (en) 2018-08-22 2023-11-30 Method for Querying Long-Form Speech

Family Applications After (2)

Application Number Title Priority Date Filing Date
US17/394,800 Active US11880420B2 (en) 2018-08-22 2021-08-05 Method for querying long-form speech
US18/524,697 Pending US20240095292A1 (en) 2018-08-22 2023-11-30 Method for Querying Long-Form Speech

Country Status (4)

Country Link
US (3) US11138278B2 (https=)
EP (1) EP3841488A4 (https=)
JP (1) JP7293359B2 (https=)
WO (1) WO2020041098A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20250139162A1 (en) * 2023-10-26 2025-05-01 Curio XR Systems and methods for browser extensions and large language models for interacting with video streams

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11004449B2 (en) * 2018-11-29 2021-05-11 International Business Machines Corporation Vocal utterance based item inventory actions
US11204920B2 (en) * 2019-09-09 2021-12-21 Accenture Global Solutions Limited Utilizing search engine relevancy ranking models to generate normalized and comparable search engine scores
US11580102B2 (en) * 2020-04-02 2023-02-14 Ocient Holdings LLC Implementing linear algebra functions via decentralized execution of query operator flows
US11776549B2 (en) * 2020-11-06 2023-10-03 Google Llc Multi-factor audio watermarking
US20240414017A1 (en) * 2023-06-06 2024-12-12 Zoom Video Communications, Inc. Artificial intelligence (ai) system for handling a query about a conversation in a videoconferencing meeting
US20250140244A1 (en) * 2023-10-26 2025-05-01 Zoom Video Communications, Inc. Follow-up queries for large language models during virtual conferences
US12436950B1 (en) * 2024-03-29 2025-10-07 Microsoft Technology Licensing, Llc Machine learning accelerated semantic equivalence detection
CN118467573B (zh) * 2024-07-09 2024-11-15 广东省科技基础条件平台中心 一种基于多维度大数据筛选分析方法

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030177112A1 (en) * 2002-01-28 2003-09-18 Steve Gardner Ontology-based information management system and method
US20080320037A1 (en) 2007-05-04 2008-12-25 Macguire Sean Michael System, method and apparatus for tagging and processing multimedia content with the physical/emotional states of authors and users
US20090306979A1 (en) 2008-06-10 2009-12-10 Peeyush Jaiswal Data processing system for autonomously building speech identification and tagging data
US20110273455A1 (en) 2010-05-04 2011-11-10 Shazam Entertainment Ltd. Systems and Methods of Rendering a Textual Animation
US20110319160A1 (en) * 2010-06-25 2011-12-29 Idevcor Media, Inc. Systems and Methods for Creating and Delivering Skill-Enhancing Computer Applications
US20140089365A1 (en) * 2012-09-21 2014-03-27 Fondation de I'Institut de Recherche Idiap Object detection method, object detector and object detection computer program
US20150039538A1 (en) * 2012-06-01 2015-02-05 Mohamed Hefeeda Method for processing a large-scale data set, and associated apparatus
US20150227505A1 (en) * 2012-08-27 2015-08-13 Hitachi, Ltd. Word meaning relationship extraction device
US20170192994A1 (en) 2016-01-05 2017-07-06 The grät Network, PBC Systems and methods concerning tracking models for digital interactions
US20180330009A1 (en) * 2017-05-11 2018-11-15 Open Text Sa Ulc System and method for searching chains of regions and associated search operators
US20180349477A1 (en) * 2017-06-06 2018-12-06 Facebook, Inc. Tensor-Based Deep Relevance Model for Search on Online Social Networks
US20190117142A1 (en) * 2017-09-12 2019-04-25 AebeZe Labs Delivery of a Digital Therapeutic Method and System
US20190354594A1 (en) * 2018-05-20 2019-11-21 Microsoft Technology Licensing, Llc Building and deploying persona-based language generation models
US20200004875A1 (en) * 2018-06-29 2020-01-02 International Business Machines Corporation Query expansion using a graph of question and answer vocabulary

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6055531A (en) 1993-03-24 2000-04-25 Engate Incorporated Down-line transcription system having context sensitive searching capability
US8359282B2 (en) 2009-01-12 2013-01-22 Nec Laboratories America, Inc. Supervised semantic indexing and its extensions
WO2012134610A1 (en) * 2011-03-30 2012-10-04 Thomson Licensing Method for image playback verification
US9430563B2 (en) * 2012-02-02 2016-08-30 Xerox Corporation Document processing employing probabilistic topic modeling of documents represented as text words transformed to a continuous space
US11205274B2 (en) * 2018-04-03 2021-12-21 Altumview Systems Inc. High-performance visual object tracking for embedded vision systems

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030177112A1 (en) * 2002-01-28 2003-09-18 Steve Gardner Ontology-based information management system and method
US20080320037A1 (en) 2007-05-04 2008-12-25 Macguire Sean Michael System, method and apparatus for tagging and processing multimedia content with the physical/emotional states of authors and users
US20090306979A1 (en) 2008-06-10 2009-12-10 Peeyush Jaiswal Data processing system for autonomously building speech identification and tagging data
US20110273455A1 (en) 2010-05-04 2011-11-10 Shazam Entertainment Ltd. Systems and Methods of Rendering a Textual Animation
US20110319160A1 (en) * 2010-06-25 2011-12-29 Idevcor Media, Inc. Systems and Methods for Creating and Delivering Skill-Enhancing Computer Applications
US20150039538A1 (en) * 2012-06-01 2015-02-05 Mohamed Hefeeda Method for processing a large-scale data set, and associated apparatus
US20150227505A1 (en) * 2012-08-27 2015-08-13 Hitachi, Ltd. Word meaning relationship extraction device
US20140089365A1 (en) * 2012-09-21 2014-03-27 Fondation de I'Institut de Recherche Idiap Object detection method, object detector and object detection computer program
US20170192994A1 (en) 2016-01-05 2017-07-06 The grät Network, PBC Systems and methods concerning tracking models for digital interactions
US20180330009A1 (en) * 2017-05-11 2018-11-15 Open Text Sa Ulc System and method for searching chains of regions and associated search operators
US20180349477A1 (en) * 2017-06-06 2018-12-06 Facebook, Inc. Tensor-Based Deep Relevance Model for Search on Online Social Networks
US20190117142A1 (en) * 2017-09-12 2019-04-25 AebeZe Labs Delivery of a Digital Therapeutic Method and System
US20190354594A1 (en) * 2018-05-20 2019-11-21 Microsoft Technology Licensing, Llc Building and deploying persona-based language generation models
US20200004875A1 (en) * 2018-06-29 2020-01-02 International Business Machines Corporation Query expansion using a graph of question and answer vocabulary

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Eugene Rules list, archived Oct. 18, 2014, https://web.archive.org/web/20141018124711/https://j5.jbei.org/j5manual/pages/85.html (Year: 2014). *
International Search Report for PCT Application PCT/US2019/046700, dated Nov. 12, 2019.
Written Opinion of the ISA for PCT Application PCT/US2019/046700, dated Nov. 12, 2019.

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20250139162A1 (en) * 2023-10-26 2025-05-01 Curio XR Systems and methods for browser extensions and large language models for interacting with video streams
US12524465B2 (en) * 2023-10-26 2026-01-13 Curioxr, Inc. Systems and methods for browser extensions and large language models for interacting with video streams

Also Published As

Publication number Publication date
US20240095292A1 (en) 2024-03-21
JP2021534528A (ja) 2021-12-09
EP3841488A4 (en) 2022-05-18
JP7293359B2 (ja) 2023-06-19
US11880420B2 (en) 2024-01-23
EP3841488A1 (en) 2021-06-30
US20200065420A1 (en) 2020-02-27
WO2020041098A1 (en) 2020-02-27
US20210365512A1 (en) 2021-11-25

Similar Documents

Publication Publication Date Title
US11880420B2 (en) Method for querying long-form speech
US9390711B2 (en) Information recognition method and apparatus
US9514741B2 (en) Data shredding for speech recognition acoustic model training under data retention restrictions
TWI666558B (zh) 語意分析方法、語意分析系統及非暫態電腦可讀取媒體
KR102081495B1 (ko) 계정 추가 방법, 단말, 서버, 및 컴퓨터 저장 매체
WO2021003408A1 (en) System and method for performing a meaning search using a natural language understanding (nlu) framework
US9514740B2 (en) Data shredding for speech recognition language model training under data retention restrictions
CN103744836A (zh) 一种人机对话方法及装置
CN114861640B (zh) 文本摘要模型的训练方法及装置
CN110321562B (zh) 一种基于bert的短文本匹配方法及装置
US20150303941A1 (en) Method and system for processing text
CN113987162A (zh) 文本摘要的生成方法、装置及计算机设备
CN111667828B (zh) 语音识别方法和装置、电子设备和存储介质
Boussougou et al. Enhancing voice phishing detection using multilingual back-translation and smote: an empirical study
CN113516961B (zh) 一种音符生成方法、相关设备、存储介质及程序产品
CN111026281B (zh) 一种客户端的词组推荐方法、客户端及存储介质
CN108682415A (zh) 语音搜索方法、装置和系统
Caccia et al. Training plug-n-play knowledge modules with deep context distillation
JP2019035786A (ja) 言語モデル生成装置、及び言語モデル生成方法
CN114386434A (zh) 机器阅读理解方法、机器阅读理解装置及非暂态计算机可读取媒体
CN118018029A (zh) 文本压缩方法、装置、存储介质及电子设备
Fewzee et al. Emotional Speech: A Spectral Analysis.
CN115858772A (zh) 对文本进行分类的方法、装置以及存储介质
US20260072972A1 (en) Post-work summary generation, refinement, and update for field service calls
WO2025247504A1 (en) Unlimited context length with maximum self-information compression

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

AS Assignment

Owner name: GRIDSPACE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SCODARY, ANTHONY;REEL/FRAME:047406/0713

Effective date: 20181025

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: USAA PROPERTY HOLDINGS, INC., TEXAS

Free format text: SECURITY INTEREST;ASSIGNOR:GRIDSPACE, INC.;REEL/FRAME:062391/0288

Effective date: 20221026

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4