CN115547337A - Speech recognition method and related product - Google Patents

Speech recognition method and related product Download PDF

Info

Publication number
CN115547337A
CN115547337A CN202211487069.2A CN202211487069A CN115547337A CN 115547337 A CN115547337 A CN 115547337A CN 202211487069 A CN202211487069 A CN 202211487069A CN 115547337 A CN115547337 A CN 115547337A
Authority
CN
China
Prior art keywords
scene
target
pinyin
user
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211487069.2A
Other languages
Chinese (zh)
Other versions
CN115547337B (en
Inventor
祝明
王曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Renma Interactive Technology Co Ltd
Original Assignee
Shenzhen Renma Interactive Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Renma Interactive Technology Co Ltd filed Critical Shenzhen Renma Interactive Technology Co Ltd
Priority to CN202211487069.2A priority Critical patent/CN115547337B/en
Publication of CN115547337A publication Critical patent/CN115547337A/en
Application granted granted Critical
Publication of CN115547337B publication Critical patent/CN115547337B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application provides a voice recognition method and a related product, wherein the method comprises the following steps: the server calls a human-computer interaction engine to interact with a user through terminal equipment, target voice information input by the user in an interaction process is obtained, character recognition is carried out on the target voice information, a first text is obtained, scene recognition and scene associated word extraction are carried out on the first text, a target service scene and a target scene associated word corresponding to the first text are determined, pinyin comparison is carried out on the target scene associated word and a scene associated word in a target scene hot word set corresponding to the target service scene, difference value scores of the target scene associated word and the scene hot word are obtained, the target scene associated word in the first text is replaced by the target scene hot word with the highest difference value score in the target scene hot word set, a second text is obtained, and corresponding service operation is executed according to user intention of the second text. Therefore, the accuracy of voice recognition can be improved, and user experience is improved.

Description

Speech recognition method and related product
Technical Field
The application belongs to the technical field of general data processing of the Internet industry, and particularly relates to a voice recognition method and a related product.
Background
With the development of internet industry, voice interaction is performed between a user and a device such as a mobile phone, and corresponding services are provided for the user based on voice information input by the user in the interaction process, so that voice recognition is very important to ensure that the services meet the requirements of the user. At present, when a merchant conducts voice recognition, due to the fact that pronunciation of a user is not standard or homophones exist, a voice recognition result is inaccurate.
Disclosure of Invention
The application provides a voice recognition method and a related product, so as to improve the accuracy of voice recognition and improve user experience.
In a first aspect, an embodiment of the present application provides a speech recognition method, which is applied to a server in a speech recognition system, where the speech recognition system includes the server and a terminal device for performing speech interaction with a user, the server includes a human-computer interaction engine supporting human-computer speech interaction, and the method includes:
calling the human-computer interaction engine to interact with the user through the terminal equipment, and acquiring target voice information input by the user in the interaction process; performing character recognition on the target voice information to obtain a first text;
performing scene recognition on the first text, and determining a target service scene corresponding to the first text, wherein the target service scene is used for representing a service type which is expressed by the first text and needs to be provided;
extracting scene associated words from the first text to obtain target scene associated words corresponding to the first text, wherein the target scene associated words are used for representing the service content of the service type required to be provided and expressed by the first text;
performing scene hot word set query according to the target service scene to obtain a target scene hot word set corresponding to the target service scene;
performing pinyin comparison on the target scene associated word and the scene hot word in the target scene hot word set to obtain a difference value score of the target scene associated word and the scene hot word in the target scene hot word set, wherein the scene hot word is a vocabulary with the heat degree greater than a heat degree threshold value, and the heat degree refers to the query heat degree of the vocabulary in all users;
determining a target scene hotword with the highest difference score in the target scene hotword set;
replacing a target scene associated word in the first text with the target scene hot word to obtain a second text;
determining the user intention expressed by the target voice information according to the second text; and (c) a second step of,
and executing corresponding service operation according to the determined user intention.
In a second aspect, an embodiment of the present application provides a speech recognition apparatus, which is applied to a server in a speech recognition system, where the speech recognition system includes a server and a terminal device for performing speech interaction with a user, the server includes a human-computer interaction engine supporting human-computer speech interaction, and the apparatus includes:
the acquisition unit is used for calling the human-computer interaction engine to interact with the user through the terminal equipment, and acquiring target voice information input by the user in the interaction process; performing character recognition on the target voice information to obtain a first text;
a scene recognition unit, configured to perform scene recognition on the first text, and determine a target service scene corresponding to the first text, where the target service scene is used to represent a service type that needs to be provided and is expressed by the first text;
the scene associated word extracting unit is used for extracting a scene associated word from the first text to obtain a target scene associated word corresponding to the first text, wherein the target scene associated word is used for representing the service content of the service type required to be provided and expressed by the first text;
the scene hot word set query unit is used for carrying out scene hot word set query according to the target service scene to obtain a target scene hot word set corresponding to the target service scene;
the comparison unit is used for performing pinyin comparison on the target scene associated word and the scene hot words in the target scene hot word set to obtain a difference value score between the target scene associated word and the scene hot words in the target scene hot word set, wherein the scene hot words are words with the heat degree greater than a heat degree threshold value, and the heat degree refers to the query heat degree of the words in all users;
the first determining unit is used for determining a target scene hot word with the highest difference value score in the target scene hot word set;
the replacing unit is used for replacing the target scene associated words in the first text with the target scene hot words to obtain a second text;
a second determining unit, configured to determine, according to the second text, a user intention expressed by the target speech information; and (c) a second step of,
and the service unit is used for executing corresponding service operation according to the determined user intention.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, and one or more programs, stored in the memory and configured to be executed by the processor, where the program includes instructions for performing the steps in the method according to the first aspect of the embodiment of the present application.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program/instructions are stored, which, when executed by a processor, implement the steps of the method according to the first aspect of the embodiments of the present application.
In a fifth aspect, the present application provides a computer program product, which includes a computer program/instruction, and when executed by a processor, implements the steps of the method according to the first aspect of the present application.
In the embodiment of the application, a server firstly calls a human-computer interaction engine to interact with a user through terminal equipment, target voice information input by the user in an interaction process is obtained, character recognition is carried out on the target voice information to obtain a first text, scene recognition and scene associated word extraction are carried out on the first text, a target service scene and a target scene associated word corresponding to the first text are determined, pinyin comparison is carried out on the target scene associated word and a scene associated word in a target scene hot word set corresponding to the target service scene to obtain a difference score between the target scene associated word and the scene hot word, the target scene associated word in the first text is replaced by the target scene hot word with the highest difference score in the target scene hot word set to obtain a second text, a user intention expressed by the target voice information is determined according to the second text, and finally corresponding service operation is executed according to the user intention. Therefore, the server can interact with the user through the terminal equipment, carries out voice recognition on voice information input by the user in the interaction process to obtain a first text, sequentially executes operations of scene recognition of the first text, scene associated word extraction of the first text, pinyin comparison between the scene associated words and scene hot words and the like to correct the first text, obtains a corrected second text, avoids the situation that the pronunciation of the user is not standard or the voice recognition result is inaccurate in a homophone scene, is favorable for improving the accuracy of the voice recognition, and improves the experience of the user.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a block diagram of a speech recognition system according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a speech recognition method according to an embodiment of the present application;
fig. 3a is a schematic flowchart of a process for obtaining a difference score between a target scene related word and a scene hotword in a target scene hotword set according to an embodiment of the present application;
fig. 3b is a schematic diagram of a first server interacting with a terminal device according to an embodiment of the present application;
fig. 3c is a schematic diagram of a second server interacting with a terminal device according to an embodiment of the present application;
fig. 3d is a schematic diagram of a third server interacting with a terminal device according to an embodiment of the present application;
fig. 3e is a schematic diagram of a fourth server interacting with a terminal device according to an embodiment of the present application;
fig. 3f is a schematic diagram of a fifth server interacting with a terminal device according to an embodiment of the present application;
fig. 4 is a block diagram illustrating functional units of a speech recognition apparatus according to an embodiment of the present disclosure;
fig. 5 is a block diagram illustrating functional units of another speech recognition apparatus according to an embodiment of the present application;
fig. 6 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
First, a system architecture according to an embodiment of the present application will be described.
Referring to fig. 1, fig. 1 is a block diagram of a speech recognition system according to an embodiment of the present disclosure. As shown in fig. 1, a voice recognition system 10 includes a server 11 and a terminal device 12 for performing voice interaction with a user, the server 11 is in communication connection with the terminal device 12, the server 11 includes a human-computer interaction engine supporting human-computer voice interaction, the server 11 interacts with the user through the terminal device 12 by calling the human-computer interaction engine to obtain target voice information input by the user during interaction, performs character recognition on the target voice information to obtain a first text, and analyzes a user intention expressed by the target voice information according to the first text; and executing corresponding service operation according to the determined user intention. The server 11 may be a server, or a server cluster composed of a plurality of servers, or a cloud computing service center, and the terminal device 12 may be a mobile phone terminal, a tablet computer, a notebook computer, or the like.
Based on this, the embodiment of the present application provides a speech recognition method, and the following describes the embodiment of the present application in detail with reference to the accompanying drawings.
Referring to fig. 2, fig. 2 is a flowchart illustrating a speech recognition method according to an embodiment of the present application, where the method is applied to a server 11 in a speech recognition system 10 shown in fig. 1, the speech recognition system 10 includes the server 11 and a terminal device 12 for performing speech interaction with a user, the server 11 includes a human-machine interaction engine supporting speech recognition, and as shown in fig. 2, the method includes:
step 201, calling the human-computer interaction engine to interact with the user through the terminal equipment, and acquiring target voice information input by the user in the interaction process; and performing character recognition on the target voice information to obtain a first text.
Step 202, performing scene recognition on the first text, and determining a target service scene corresponding to the first text.
Wherein, the target service scene is used for representing the service type which is expressed by the first text and needs to be provided. The service scene may be, but is not limited to, one of a song listening service scene, a reading service scene, a video service scene, and a navigation service scene.
For example, when the service scenario is a song listening service scenario, the service type may be a song playing service, when the service scenario is a novel reading service scenario, the service type may be a novel pushing service, when the service scenario is a shopping service scenario, the service type may be a commodity pushing service, and when the service scenario is a navigation service scenario, the service type may be a navigation service.
Step 203, performing scene associated word extraction on the first text to obtain a target scene associated word corresponding to the first text.
The target scene associated word is used for representing the service content of the service type which is expressed by the first text and needs to be provided.
Illustratively, the service type is a song playing service, the scenario associated word is a name of a person, and the service content is that the server pushes a song associated with the name of the person to the terminal device so as to be played by the terminal device, at this time, the name of the person may be, but is not limited to, one of a singer, a writer, and a composer of the song, and is not specifically limited; the service type is a novel push service, the scene associated word is a name of a person, the service content is that the server pushes novel information associated with the name of the person to the terminal device so as to be displayed by the terminal device, and at this time, the name of the person can indicate one of an author, a recommender and an instructor of the novel, but is not limited specifically; the service type is commodity pushing service, the scene associated word is a commodity name, and the service content is that the server pushes commodity information associated with the commodity name to the terminal equipment so as to be displayed by the terminal equipment conveniently; the service type is navigation service, the scene associated word is a place name, and the service content is that the server pushes navigation information associated with the place name to the terminal equipment so that the terminal equipment can conveniently navigate to the place name for a user.
And 204, inquiring a scene hot word set according to the target service scene to obtain a target scene hot word set corresponding to the target service scene.
The corresponding relation between various service scenes and the scene hot word set is preset.
Step 205, performing pinyin comparison on the target scene associated word and the scene hotword in the target scene hotword set to obtain a difference value score between the target scene associated word and the scene hotword in the target scene hotword set.
The scene hot words are words with the heat degree larger than a heat degree threshold value, and the heat degree in the application refers to the query heat degree of the words in all users. The higher the popularity of the vocabulary is, the more the query times of all users for the vocabulary is, and conversely, the lower the popularity of the vocabulary is, the less the query times of all users for the vocabulary is.
It should be noted that the larger the difference score between two vocabularies, the higher the similarity between the two vocabularies, i.e. the more similar the two vocabularies are, whereas the smaller the difference score between the two vocabularies, the lower the similarity between the two vocabularies, i.e. the greater the difference between the two vocabularies is.
Step 206, determining the target scene hotword with the highest difference value score in the target scene hotword set.
And step 207, replacing the target scene associated word in the first text with the target scene hotword to obtain a second text.
And step 208, determining the user intention expressed by the target voice information according to the second text.
And step 209, executing corresponding service operation according to the determined user intention.
The target scene hot word set may include scene hot words that are the same as the target scene related words, and the target scene hot word set may not include scene hot words that are the same as the target scene related words.
It can be seen that, in the embodiment of the application, a server firstly calls a human-computer interaction engine to interact with a user through a terminal device, target voice information input by the user is obtained in an interaction process, character recognition is performed on the target voice information, a first text is obtained, scene recognition and scene associated word extraction are performed on the first text, a target service scene and a target scene associated word corresponding to the first text are determined, pinyin comparison is performed on the target scene associated word and a scene hot word in a target scene hot word set corresponding to the target service scene, a difference value score between the target scene associated word and the scene hot word is obtained, the target scene associated word in the first text is replaced by the target scene hot word with the highest difference value score in the target scene hot word set, a second text is obtained, a user intention expressed by the target voice information is determined according to the second text, and finally, corresponding service operation is executed according to the user intention. Therefore, the server can interact with the user through the terminal equipment, carries out voice recognition on voice information input by the user in the interaction process to obtain a first text, sequentially executes operations of scene recognition of the first text, scene associated word extraction of the first text, pinyin comparison between the scene associated words and scene hot words and the like to correct the first text, obtains a corrected second text, avoids the condition that the pronunciation of the user is not standard or the voice recognition result is inaccurate under the homophone scene, is favorable for improving the accuracy of the voice recognition, and improves the experience of the user.
For convenience of understanding, a process of obtaining a difference value score between a target scene related word and a scene hotword in a target scene hotword set in the embodiment of the present application will be described below.
Referring to fig. 3a, fig. 3a is a schematic flowchart of a process for obtaining a difference score between a target scene associated word and a scene hotword in a target scene hotword set according to an embodiment of the present disclosure, and as shown in fig. 3a, the process a for obtaining a difference score between a target scene associated word and a scene hotword in a target scene hotword set includes:
step 301, determining whether a first vocabulary completely identical to the pinyin of the target scene associated word exists in the target scene hot word set.
If so, go to step 302.
Step 302, determining whether the number of the first vocabulary is greater than 1.
After step 302, if yes, step 303 is executed.
Step 303, determining whether the user has queried the first vocabulary.
After step 303, if yes, go to step 304.
And step 304, determining whether the number of the second words queried in the first words is larger than 1.
After step 304, if yes, step 305 is performed.
And 305, determining whether the time interval between the query time of the user for each second vocabulary and the current time is greater than a preset interval.
The preset interval may be, but not limited to, 10 days, 15 days, 30 days, etc., and is not particularly limited.
After step 305, if yes, go to step 306.
Step 306, determining that the difference value score of the scene hot word with the largest query frequency in the second vocabulary is the highest.
For example, the preset time interval is 10 days, the target scene related word is "apore", the target scene hot word set is a scene hot word set B, 4 first words "ashan", "arse", "araucaria" and "argy" identical to the pinyin of "apore" exist in the scene hot word set B, the query time of the user for "arse" is 11 days from the current time, the query time of the user for "argy" is 12 days from the current time, the query times of the user for "arse" are 5 times, the query times of the user for "argy" is 1 time, and no query records of the user for "ashan" and "araucaria" exist. In the above process of obtaining the difference score between the target scene related word and the scene hotword in the target scene hotword set, it is determined that "ashan", "arson", "arjest" completely identical to the pinyin of "arjest" exists in the scene hotword set B, then it is determined that the number 4 of "ashan", "arson", "arjest" is greater than 1, then it is determined that the user has performed a query on "ashan", and the user has also performed a query on "arjest", then it is determined that the number 2 of "arson" and "arjest" that have been queried is greater than 1, then it is determined that the query time of the user on "arsan" is greater than 10 days from the current time 11 days, and the query time of the user on "arjest" is greater than 10 days from the current time 12 days, and then it is determined that the difference score of "arshansan" with the largest number of queries in "arjest" and "arjest" is the highest. For example, with reference to fig. 3b in combination with a specific application scenario, fig. 3b is a schematic diagram of interaction between a first server and a terminal device provided in an embodiment of the present application, where the server asks a user: asking what help is needed; if the terminal equipment acquires the target voice information input by the user, the target voice information is as follows: i want to see the fiction of deleting; the server obtains a second text "i want to see the fiction of asharad" based on the above flow, determines that the intention of the user is to see the fiction of asharad according to the second text, and pushes a first page to the terminal device, wherein the first page comprises a website link www of the fiction of asharad. The user operation prompt message of "preferably, the target scene hotword" arsan "in the user operation prompt message can be highlighted, such as bolded, darkened, etc.; the terminal equipment displays the first page; the user clicks "www.
As an optional flow branch, the flow a further includes after the step 305, if no, executing a step 307.
Step 307, determining that the difference value score of the scene hot word with the shortest time interval between the query time and the current time in the second vocabulary is the highest.
For example, the preset time interval is 5 days, the target scene related word is "ashan", the target scene hot word set is a scene hot word set a, 3 first words "ashan", "arsal" and "argy" identical to the pinyin of "ashan" exist in the scene hot word set a, the query time of the user for "ashan" is 3 days from the current time, the query time of the user for "arsal" is 4 days from the current time, and the query time of the user for "argy" is 7 days from the current time. In the above process of obtaining a difference score between a target scene related word and a scene hot word in a target scene hot word set, it is first determined that "ashan", "arser" identical to the pinyin of "ashan" exists in the scene hot word set a, then it is determined that the number 3 of "ashan", "arser" is greater than 1, then it is determined that the user has performed queries on all of "ashan", "arser", and "arlong", respectively, then it is determined that the number 3 of "ashan", "arser", and "arlong" that have been queried is greater than 1, then it is determined that the query time of the user on "ashan" is less than 5 days from the current time, and the query time of the user on "arser" is less than 5 days from the current time for 4 days, and then it is determined that the difference score of "ashan" in which the time interval between the query time and the current time is the shortest. Illustratively, in conjunction with a particular application scenario, the server asks the user: asking what service is needed, if the terminal device obtains the target voice information input by the user: the server obtains a second text 'the song of the mountain is heard by the server' based on the above process, and the subsequent reply is determined according to the second text as follows: the music of the ashan is played for the user, and meanwhile, the server can push the music of the ashan to the terminal device, so that the terminal device can play the music of the ashan.
As another optional flow branch, after the step 302, if the number of the first vocabulary is equal to 1, the flow a further includes determining that the difference score of the first vocabulary is the highest.
For example, the target scene related word is "delete", the target scene hot word set is a scene hot word set C, and 1 first word "argy" identical to the pinyin of "delete" exists in the scene hot word set C. Based on the above procedure of obtaining the difference value score of the target scene related word and the scene hot word in the target scene hot word set, first, it is determined that the first word "argy" identical to the pinyin of "argy" exists in the scene hot word set C, and finally, it is determined that the number 1 of "argy" is equal to 1, and then it is determined that the difference value score of "argy" is the highest. For example, with reference to fig. 3c in combination with a specific application scenario, fig. 3c is a schematic diagram of interaction between a second server and a terminal device provided in an embodiment of the present application, where the server asks a user a question: asking what service is needed, if the terminal device obtains the target voice information input by the user: i want to listen to the song of argy, the server obtains a second text "i want to listen to the song of argy" based on the above process, determines that the user's intention is to listen to the song of argy according to the second text, and pushes a second page to the terminal device, where the second page includes a user prompt message like "the song of argy will be played for you", and preferably, the target scene hotword "argy" in the user prompt message can be highlighted, such as bold, deepened color, enlarged font, etc.; the terminal equipment displays the second page; then, the server pushes the song of ARARS to the terminal equipment; the terminal device plays the song of ARARS pushed by the server.
As another optional flow branch, after step 303, if the user has not queried the first vocabulary, the flow a further includes determining that the difference score of the scene hotword with the highest degree in the first vocabulary is the highest.
For example, the target scene related word is "assam", the target scene hot word set is a scene hot word set D, two first words "assam" and "argy" identical to the pinyin of "assam" exist in the scene hot word set D, the user has not performed a query on "assam" and "argy", the heat of "assam" is 1113, and the heat of "argy" is 6001. In the above process of obtaining the difference score between the target scene related word and the scene hot word in the target scene hot word set, first, it is determined that "arsal" and "argos" identical to the pinyin of "arsal" exist in the scene hot word set D, and then it is determined that the number 2 of "arsal" and "argos" is greater than 1, and then it is determined that the user has not performed a query on "arsal" or "argos", and it is determined that the difference score of "argos" and "argos" with the highest degree of similarity is the highest. For example, referring to fig. 3d in combination with a specific application scenario, fig. 3d is a schematic diagram of interaction between a third server and a terminal device according to an embodiment of the present application, where the target voice information that is acquired by the terminal device and input by a user is: the server obtains a second text "show with alason", determines that the user's intention is the show that wants to see alason based on the second text, pushes a third page to the terminal device, the third page including user query information like "whether to play alason's show" and a first button "yes" and a second button "no", and performs font-up highlighting on a target scene hot word "alason" in the user prompt information, based on the above flow; the terminal equipment displays the third page; thereafter, if the user clicks the first button "yes", the server pushes the tv show performed by the alas to the terminal apparatus.
As another optional flow branch, the flow a further includes after the step 304, determining that the difference score of the second vocabulary is the highest if the number of the second vocabulary is equal to 1.
For example, the target scene related word is "ashan", the target scene hot word set is a scene hot word set E, there are 5 first words "ashan", "arsin", "arauca", "jersey" in the scene hot word set E, which are identical to the pinyin of "ashan", and the user has made a query only for "arsin". In the above process of obtaining the difference score between the target scene related word and the scene hot word in the target scene hot word set, it is determined that "ashan", "arser", "arauca", "jerusal" identical to the pinyin of "arsine" are present in the scene hot word set E, and it is determined that the number 5 of "arsine", "arser", "arjun", "arauca" is greater than 1, and then it is determined that the user has made a query for "arlong", and finally it is determined that the number 1 of the second word "arlong" that has been queried is equal to 1, and it is determined that the difference score of "arlong" is the highest. For example, referring to fig. 3e in combination with a specific application scenario, fig. 3e is a schematic view of interaction between a fourth server and a terminal device provided in an embodiment of the present application, where if the terminal device obtains target voice information input by a user, the target voice information is: what was said in ashan; the server obtains a second text "what dialect of argos" based on the above flow, and determines that the intention of the user is to know the dialect of argos according to the second text, thereby pushing a fourth page to the terminal device, the fourth page including the dialect information of argos; the terminal equipment displays the fourth page, so that the user can conveniently view the dialect of the Aoshan.
As another optional process branch, the process a further includes, after the step 301, performing pinyin replacement on the pinyin of the target scene associated word if the first vocabulary does not exist, so as to obtain a replaced pinyin; and comparing the replaced pinyin with the scene hotwords in the target scene hotword set to obtain a difference value score of the target scene associated word and the scene hotwords in the target scene hotword set.
For example, the target scene related word is "a three", the target scene hot word set is a scene hot word set F, and a first word identical to the pinyin of "a three" does not exist in the scene hot word set F. Based on the above process of obtaining the difference value score of the target scene associated word and the scene hot word in the target scene hot word set, it is determined that the first word identical to the pinyin of the third word is not present in the scene hot word set E, and the pinyin of the third word is obtained "
Figure 444671DEST_PATH_IMAGE001
' performing pinyin replacement to obtain replaced pinyin, if the replaced pinyin is "
Figure 798292DEST_PATH_IMAGE002
", will"
Figure 330905DEST_PATH_IMAGE002
And comparing the ' three-dimensional image data with the scene hot words in the scene hot word set F to obtain the difference value score of the ' three-dimensional image data ' and the scene hot words in the scene hot word set F.
In this example, the server can accurately determine the target scene hot word with the highest difference value score in the target scene hot word set by combining the scene hot word in the target scene hot word set and the pinyin of the target scene associated word, and the query time and the query number of the user on the scene hot word in the target scene hot word set, so that the accuracy of the target scene hot word is improved.
In a possible example, the pinyin replacement is performed on the pinyin of the target scene associated word, and the implementation manner of obtaining the pinyin after replacement includes but is not limited to: determining a native and/or living address of the user; determining pronunciation characteristics corresponding to the native place and/or the life address; determining the number of pinyin capable of being replaced by pinyin in each pinyin corresponding to the target scene associated word according to the pronunciation characteristics; and if the pinyin number is more than 1, sequentially performing pinyin replacement according to the occurrence sequence of each character needing pinyin replacement in the target scene associated word to obtain a plurality of replaced pinyins.
Wherein the living address may include an address where the user currently lives and lives for more than a first preset time, and the first preset time may be one year, two years, half year, and the like. Illustratively, the first predetermined time is two years, and if the user currently lives in place a and lives in place a for 4 years, place a is the user's address of life.
In addition, the living address may further include an address where the user has lived for more than a second preset time, which may be two years, three years, five years, etc., wherein the first preset time and the second preset time may be the same. Preferably, the second preset time is longer than the first preset time. Illustratively, the first preset time is one year, the second preset time is five years, and if the user currently lives in place a for two years, the user once lives in place B for 3 years, and the user once lives in place C for 6 years, then place a and place C are both the addresses of life of the user.
It is understood that the living address may be at least one, and there may or may not be a coincidence between the native address and the living address, for example, the native address of the user is a place, and the living address of the user may be a place and B place; the place of the user is A place, and the living address of the user can be B place and C place.
For example, the target scene associated word is "winy three", the target scene hotword set is a scene hotword set G, a first vocabulary completely identical to pinyin of "winy three" does not exist in the scene hotword set G, the native place of the user is a place a, and if the pronunciation characteristic of the place a is that the flat warped tongue is not divided. In the specific implementation, after determining that a first vocabulary completely identical to the pinyin of 'Zhongsan' does not exist in the scene hotword set G, determining the native place of the user A, determining the pronunciation characteristic 1 of the place A with indifferent flat warping tongues, and determining the corresponding 'Zhongsan' according to the pronunciation characteristic 1 "
Figure 529805DEST_PATH_IMAGE003
'and'
Figure 464263DEST_PATH_IMAGE004
'all can be replaced by pinyin'
Figure 254364DEST_PATH_IMAGE003
"and"
Figure 8694DEST_PATH_IMAGE004
In which the number of pinyin capable of being replaced by pinyin is 2,2 is more than 1, and according to the occurrence of the Chinese characters 'Zhou' and 'III' needing pinyin replacementSequentially carrying out pinyin replacement to obtain a plurality of replaced pinyin "
Figure 276864DEST_PATH_IMAGE005
”、“
Figure 65828DEST_PATH_IMAGE006
"and"
Figure 761252DEST_PATH_IMAGE007
”。
For another example, the target scene related word is "rice at schedule", the target scene hot word set is a scene hot word set H, a first word identical to the pinyin of "rice at schedule" does not exist in the scene hot word set H, the living address of the user is place B, and if the pronunciation feature of the place B is that "go" is pronounced as "
Figure 268457DEST_PATH_IMAGE008
'eating rice' with sound "
Figure 340318DEST_PATH_IMAGE009
". In the specific implementation, after the first vocabulary completely identical to the pinyin of the ' time-of-arrival meal ' does not exist in the scene hotword set H, the native place B of the user is determined, and the ' go ' is sounded as ' in the place B "
Figure 983789DEST_PATH_IMAGE008
The pronunciation of the 'eating rice' is the pronunciation feature 1 of the 'q \299faran', and the pronunciation feature 1 is used for determining the corresponding 'counting rice';
Figure 115693DEST_PATH_IMAGE008
"and"
Figure 844614DEST_PATH_IMAGE010
All can carry out pinyin replacement "
Figure 454587DEST_PATH_IMAGE008
'and'
Figure 483723DEST_PATH_IMAGE010
The number of the pinyins which can be replaced by the pinyins is 2,2 is more than 1, and the pinyins are sequentially replaced according to the occurrence sequence of the counting and the period of the pinyins needing to be replaced in the counting meal to obtain a plurality of replaced pinyins "
Figure 255370DEST_PATH_IMAGE011
”、“
Figure 737167DEST_PATH_IMAGE012
"and"
Figure 153760DEST_PATH_IMAGE013
”。
For another example, the target scene related word is "booming of a word," the target scene hot word set is a scene hot word set I, the scene hot word set I does not have a first word identical to the pinyin of "booming of a word," the place of the user is a place, the living addresses of the user are a place and a place B, if the pronunciation characteristic of the place a is that the flat-warped tongue is not divided, and if the pronunciation characteristic of the place B is that the "wild" pinyin is used "
Figure 771824DEST_PATH_IMAGE014
"pronunciation is"
Figure 979951DEST_PATH_IMAGE015
". In specific implementation, after determining that a first vocabulary completely identical to the pinyin of 'boom of words to' does not exist in a scene hotword set I, determining the native place A and the living addresses A and B of the user, determining the pronunciation characteristic of A with indifferent horizontal warping tongue and determining the pinyin of 'mad'
Figure 949044DEST_PATH_IMAGE014
"pronunciation is"
Figure 900820DEST_PATH_IMAGE015
"pronunciation feature 3, determine" booming of word "corresponding to" according to the pronunciation feature 3 "
Figure 638968DEST_PATH_IMAGE016
"and"
Figure 17997DEST_PATH_IMAGE015
All can carry out pinyin replacement "
Figure 739965DEST_PATH_IMAGE016
"and"
Figure 26590DEST_PATH_IMAGE015
The number of the pinyin capable of being replaced by pinyin is 2,2 is more than 1, and the pinyin replacement is sequentially carried out according to the appearance sequence of the words and the bombs needing to be replaced by pinyin in the bombs of words to obtain a plurality of replaced pinyins "
Figure 619246DEST_PATH_IMAGE017
”、“
Figure 434755DEST_PATH_IMAGE018
"and"
Figure 847282DEST_PATH_IMAGE019
”。
In this example, when the server performs pinyin replacement on the pinyin of the target scene associated word, the server can accurately determine the pronunciation characteristics of the user based on the native place and/or the living address of the user, perform pinyin replacement on each pinyin corresponding to the target scene associated word according to the pronunciation characteristics to obtain a replaced pinyin, ensure that the replaced pinyin conforms to the pronunciation habits of the user, and further improve the reliability of the replaced pinyin.
In a possible example, the implementation manner of comparing the replaced pinyin with the scene hotword in the target scene hotword set to obtain the difference value score between the target scene associated word and the scene hotword in the target scene hotword set may include, but is not limited to:
step A1, determining whether a target alternative pinyin which is completely the same as the pinyin of the scene hotword in the target scene hotword set exists in the multiple alternative pinyins.
After step A1, if present, step A2 is performed.
And A2, determining the number of the target alternative pinyins.
After the step A2, if the number of the target alternative pinyins is 1, the step A3 is executed.
And A3, determining that the difference value score of the scene hot word corresponding to the target alternative pinyin is highest.
For example, the target scene associated word is "word-to-word bang", the target scene hot word set is a scene hot word set I, and the determined multiple alternative pinyins of "word-to-word bang" are "
Figure 406439DEST_PATH_IMAGE017
”、“
Figure 853601DEST_PATH_IMAGE018
"and"
Figure 574432DEST_PATH_IMAGE019
", wherein"
Figure 5414DEST_PATH_IMAGE017
”、“
Figure 837103DEST_PATH_IMAGE018
"and"
Figure 138772DEST_PATH_IMAGE019
"Zhongxiu"
Figure 296084DEST_PATH_IMAGE018
"is identical to the pinyin of scene hot words in the scene hot word set I"
Figure 686132DEST_PATH_IMAGE018
The "corresponding scene hotword is" late wind ". In the concrete implementation, firstly, a plurality of alternative pinyins are determined "
Figure 587092DEST_PATH_IMAGE017
”、“
Figure 8846DEST_PATH_IMAGE018
"and"
Figure 71480DEST_PATH_IMAGE019
"the pinyin of scene hotword in scene hotword set I is identical"
Figure 945895DEST_PATH_IMAGE019
", then, based on"
Figure 384967DEST_PATH_IMAGE019
"the number is 1, and the difference value score of" late wind "is determined to be the highest. For example, referring to fig. 3f in combination with a specific application scenario, fig. 3f is a schematic view of interaction between a fifth server and a terminal device provided in an embodiment of the present application, where the target voice information acquired by the terminal device and input by a user is: playing music and word bombing; the server obtains a second text 'music playing and late wind', determines that the intention of the user is the late wind of the song to be listened to according to the second text, pushes a fifth page to the terminal device, wherein the fifth page comprises user inquiry information similar to 'whether the song is played or not', a first button 'yes' and a second button 'no', and a target scene hotword 'late wind' in the user prompt information is in a bold font relative to other words; the terminal equipment displays the fifth page; then, if the user clicks a first button 'yes', the server pushes the late-arriving wind of the song to the terminal device; the terminal equipment plays the late wind of the song pushed by the server.
As an optional branch, after the step A1, if there is no target alternative pinyin in the multiple alternative pinyins that is identical to the pinyin of the scene hotword in the target scene hotword set, generating a prompt message, and sending the prompt message to the terminal device to prompt the user that the user does not recognize the user intention of the user.
As an optional branch, after the step A2, if the number of the target alternative pinyins is at least two, the step A4 is executed.
And A4, calculating the difference value score of the scene hot word corresponding to each target alternative pinyin according to the number of the replaced pinyins in the target alternative pinyin and the use times or heat of the scene hot word corresponding to the target alternative pinyin by the user.
For example, the target scene relevant word is "winy three", the target scene hotword set is a scene hotword set G, and a plurality of determined pinyin alternatives of "winy three" are "
Figure 661227DEST_PATH_IMAGE005
”、“
Figure 894763DEST_PATH_IMAGE006
"and"
Figure 256474DEST_PATH_IMAGE007
", wherein"
Figure 233657DEST_PATH_IMAGE005
”、“
Figure 630003DEST_PATH_IMAGE006
"and"
Figure 34440DEST_PATH_IMAGE007
"in existence"
Figure 617868DEST_PATH_IMAGE006
"and"
Figure 398742DEST_PATH_IMAGE007
"exactly the same as the pinyin of scene hotword in scene hotword set G"
Figure 915174DEST_PATH_IMAGE006
"the corresponding scene hotwords are" Zhoushan "and" Subson "
Figure 490512DEST_PATH_IMAGE007
"corresponding scene hot word" in rustic. In the concrete implementation, firstly, a plurality of alternative pinyins are determined "
Figure 561236DEST_PATH_IMAGE005
”、“
Figure 880222DEST_PATH_IMAGE006
"and"
Figure 985581DEST_PATH_IMAGE007
"the pinyin of scene hot words in which the scene hot word set G exists is identical"
Figure 731820DEST_PATH_IMAGE006
'and'
Figure 555420DEST_PATH_IMAGE007
", then, based on"
Figure 412518DEST_PATH_IMAGE006
'and'
Figure 637963DEST_PATH_IMAGE007
The number of "is 2, respectively"
Figure 820682DEST_PATH_IMAGE006
'and'
Figure 865999DEST_PATH_IMAGE007
"and a difference value score of" zhou shan "," week long ", and" period "calculated by the user for the number of uses or heat of" zhou mountain "," week long ", and" period ".
In this example, the server can determine the difference value score of the scene hotword in the target scene hotword set by combining the alternative pinyin and the pinyin of the scene hotword in the target scene hotword set, so that the convenience and the intelligence for obtaining the difference value score are improved.
Specifically, the implementation manner of step A4 may include, but is not limited to:
and step B1, determining the first pinyin with the least replaced pinyin in the target replaced pinyins, and determining whether the number of the first pinyins is more than 1.
After the step B1, if the number of the first pinyin is larger than 1, the step B2 is executed.
And B2, determining whether the user uses the scene hot word corresponding to the first pinyin.
After the step B2, if the user uses the scene hotword corresponding to the first pinyin, the step B3 is executed.
And B3, determining whether the number of third words used by the user in the scene hot words corresponding to the first pinyin is greater than 1.
After the step B3, if the number of the third vocabulary is greater than 1, the step B4 is executed.
And step B4, determining that the difference value score of the scene hot word with the highest use frequency or the highest heat degree in the third vocabulary is highest.
For example, pinyin of the above relevant words in target scene of "in three", "in three"
Figure 526787DEST_PATH_IMAGE020
", the above target scene hot word set is the scene hot word set J, and the multiple alternative pinyins of" Zhongsan "are"
Figure 606739DEST_PATH_IMAGE005
”、“
Figure 694780DEST_PATH_IMAGE006
"and"
Figure 218604DEST_PATH_IMAGE007
", wherein"
Figure 683083DEST_PATH_IMAGE005
”、“
Figure 617541DEST_PATH_IMAGE006
"and"
Figure 876484DEST_PATH_IMAGE007
"there exists the above target to replace the pinyin"
Figure 896393DEST_PATH_IMAGE005
"and"
Figure 164563DEST_PATH_IMAGE007
"determined" as the same as the pinyin of scene hot words in scene hot word set J "
Figure 219107DEST_PATH_IMAGE005
"and"
Figure 648951DEST_PATH_IMAGE007
The number of "is 2"
Figure 156156DEST_PATH_IMAGE005
"the corresponding scene hotword is" ZhouSan "
Figure 962438DEST_PATH_IMAGE007
"corresponding scene hotwords are" in (in) "and" in (in) hill ", the user used twice" saturday ", the user used 5 times" in (in) sword ", the user used 11 times" in (in) hill ", the heat of" saturday "is 301, the heat of" in (in) hill "is 8032, and the heat of" in (in) hill "is 26. In the concrete implementation, firstly, the target alternative pinyin is determined "
Figure 871488DEST_PATH_IMAGE005
"and"
Figure 737813DEST_PATH_IMAGE007
"the first pinyin with the least replaced pinyin is"
Figure 732313DEST_PATH_IMAGE005
"and"
Figure 342286DEST_PATH_IMAGE007
Next, it is determined that the number 2 of the first pinyin is greater than 1, and thereafter, it is determined that the third vocabulary in the scene hot words which the user used the first pinyin corresponds to is "saturday", "zhou mountain" and "zhou", and it is determined that the difference value score of "zhou san", "zhou mountain" and "zhou san" which are the most frequently used among the "york mountains" is the highest or it is determined that the difference value score of "zhou san", "you mountain" and "zhou san" which are the most highly used among the "zhou san", "you san" and "zhou san" is the highest, based on the number 3 of the third vocabulary being greater than 1.
As an optional branch, after step B3, if the number of the third vocabulary is equal to 1, step B5 is executed.
And step B5, determining that the difference value score of the third vocabulary is highest.
For example, pinyin of the above relevant words in target scene of "in three", "in three"
Figure 840264DEST_PATH_IMAGE020
", the above-mentioned target scene hot word set is a scene hot word set K, and" Chinese character of 'zhongsan' is a plurality of alternative pinyins "
Figure 877490DEST_PATH_IMAGE005
”、“
Figure 93708DEST_PATH_IMAGE006
"and"
Figure 241792DEST_PATH_IMAGE007
", wherein"
Figure 125435DEST_PATH_IMAGE005
”、“
Figure 333562DEST_PATH_IMAGE006
"and"
Figure 302655DEST_PATH_IMAGE007
"there is above-mentioned target to replace the spelling in it"
Figure 254431DEST_PATH_IMAGE005
"and"
Figure 727000DEST_PATH_IMAGE007
"determined" as the same as the pinyin of scene hot words in scene hot word set J "
Figure 106029DEST_PATH_IMAGE005
"and"
Figure 296839DEST_PATH_IMAGE007
The number of "is 2"
Figure 786726DEST_PATH_IMAGE005
"the corresponding scene hot word is" ZhouSanchi "
Figure 379381DEST_PATH_IMAGE007
"corresponding scene hotwords are" in jest "and" in zhou ", the user used" in zhou "11 times, and the user did not use" satris "and" zhou ". In the concrete implementation, firstly, the target alternative pinyin is determined "
Figure 666662DEST_PATH_IMAGE005
'and'
Figure 344768DEST_PATH_IMAGE007
"the first pinyin with the least replaced pinyin is"
Figure 638346DEST_PATH_IMAGE005
'and'
Figure 85508DEST_PATH_IMAGE007
Next, determining that the number 2 of the first pinyin is greater than 1, and then determining that a third vocabulary in a scene hotword corresponding to the first pinyin, which is used by a user, is "zhoushan", and determining that a plurality of jagses "are present based on the number 1 of the third vocabulary being equal to 1"In "in zhou mountain" and "in zhou", used by the user, the differential value score is highest.
As another optional branch, after step B2, if the user does not use the scene hotword corresponding to the first pinyin, step B6 is executed.
And B6, determining that the difference value score of the scene hot word with the highest heat in the scene hot words corresponding to the first pinyin is the highest.
For example, pinyin of the above relevant words in target scene of "in three", "in three"
Figure 71918DEST_PATH_IMAGE020
", the above-mentioned target scene hot word set is a scene hot word set K, and" Chinese character of 'zhongsan' is a plurality of alternative pinyins "
Figure 971741DEST_PATH_IMAGE005
”、“
Figure 803431DEST_PATH_IMAGE006
"and"
Figure 370679DEST_PATH_IMAGE007
", wherein"
Figure 262411DEST_PATH_IMAGE005
”、“
Figure 649530DEST_PATH_IMAGE006
"and"
Figure 284911DEST_PATH_IMAGE007
"there is above-mentioned target to replace the spelling in it"
Figure 972244DEST_PATH_IMAGE006
'and'
Figure 34878DEST_PATH_IMAGE007
"determined" as the same as the pinyin of the scene hotword in the scene hotword set J "
Figure 909293DEST_PATH_IMAGE006
"and"
Figure 82786DEST_PATH_IMAGE007
The number of "is 2"
Figure 359046DEST_PATH_IMAGE006
"the corresponding scene hot word is" ZhouSanchi "
Figure 592581DEST_PATH_IMAGE007
"corresponding scene thermal words are" satrse "and" zhou ", which the user has not used any of" satris "," jersey ", and" zhou ", the heat of" satris "is 301, the heat of" satrse "is 8032, and the heat of" zhou "is 26. In the concrete implementation, firstly, the target alternative pinyin is determined "
Figure 954293DEST_PATH_IMAGE006
"and"
Figure 931476DEST_PATH_IMAGE007
"the first pinyin with the least replaced pinyin is"
Figure 62243DEST_PATH_IMAGE006
"and"
Figure 466679DEST_PATH_IMAGE007
Next, it is determined that the number 2 of the first pinyin is greater than 1, and then it is determined that the difference value score of the most intense "bow" among "satay", "zhou", and "zhou" is the highest based on the scene hot words "satay", "zhou", and "jerry" to which the user has not used the first pinyin.
As another optional branch, after step B1, if the number of the first pinyins is equal to 1, step B7 is executed.
And B7, determining that the difference value score of the scene hot word corresponding to the first pinyin is highest.
For example, the target scene relevant words are "zhou," "zhou," and pinyin are "
Figure 315687DEST_PATH_IMAGE020
", the target scene hotword set is a scene hotword set G, and a plurality of replaced pinyins of" Zhoushi ""
Figure 96561DEST_PATH_IMAGE005
”、“
Figure 81834DEST_PATH_IMAGE006
"and"
Figure 657172DEST_PATH_IMAGE007
", wherein"
Figure 993476DEST_PATH_IMAGE005
”、“
Figure 312462DEST_PATH_IMAGE006
"and"
Figure 683400DEST_PATH_IMAGE007
"in existence"
Figure 429639DEST_PATH_IMAGE006
"and"
Figure 256168DEST_PATH_IMAGE007
"the pinyin of the scene hot word in the scene hot word set I is completely the same, and the scene hot word is determined"
Figure 113266DEST_PATH_IMAGE006
'and'
Figure 73132DEST_PATH_IMAGE007
The number of "is 2"
Figure 255851DEST_PATH_IMAGE006
"the corresponding scene hotwords are" Zhoushan "and" Subson "
Figure 301168DEST_PATH_IMAGE007
"corresponding scene hotword is" in jest ". In a specific implementation, first, a determination is made "
Figure 961956DEST_PATH_IMAGE006
'and'
Figure 41908DEST_PATH_IMAGE007
"the first pinyin with the least pinyin replaced is"
Figure 129949DEST_PATH_IMAGE007
", then, based on"
Figure 662562DEST_PATH_IMAGE007
"if the number of" is equal to 1, then it is determined "
Figure 127041DEST_PATH_IMAGE007
The difference score of "corresponding scene hotword" in(s) "is highest.
In this example, the server can calculate the difference value score of the scene hot word corresponding to the target replacement pinyin according to the number of the replaced pinyins in the target replacement pinyin and the number of times of use or the popularity of the scene hot word corresponding to the target replacement pinyin by the user when the target replacement pinyin is at least two, so that the comprehensiveness and the accuracy of the determination of the difference value score of the scene hot word are improved.
In one possible example, before the determining the native and/or living address of the user, the method further comprises: obtaining the mandarin level of the user; determining that the Mandarin level does not reach a preset level.
The preset level may be first level B, etc., the preset level may be first level A, etc., the preset level may be second level A, etc. The preset level may be set as desired.
Furthermore, the method further comprises: after the Mandarin level of the user is obtained, if the Mandarin level is determined to reach the preset level, prompt information is generated; and sending the prompt information to the terminal equipment to prompt that the user does not recognize the user intention of the user.
In this example, when the server performs pinyin replacement on the pinyin of the target scene associated word, the server can accurately determine the pronunciation characteristics of the user based on the mandarin level, the native place and/or the living address of the user, perform pinyin replacement on each pinyin corresponding to the target scene associated word according to the pronunciation characteristics to obtain a replaced pinyin, ensure that the replaced pinyin better conforms to the pronunciation habit of the user, and further improve the reliability of the replaced pinyin.
It can be understood that, since the method embodiment and the apparatus embodiment are different presentation forms of the same technical concept, the content of the method embodiment portion in the present application should be synchronously adapted to the apparatus embodiment portion, and is not described herein again.
Consistent with the above-described embodiments, as shown in fig. 4, fig. 4 is a block diagram of functional units of a speech recognition apparatus according to an embodiment of the present application. In fig. 4, the speech recognition apparatus 400 is applied to a server in a speech recognition system, the speech recognition system includes a terminal device for performing speech interaction between the server and a user, the server includes a human-computer interaction engine for supporting human-computer speech interaction, and the speech recognition apparatus 400 includes:
an obtaining unit 401, configured to invoke the human-machine interaction engine to interact with the user through the terminal device, and obtain target voice information input by the user in the interaction process; the voice recognition module is used for performing character recognition on the target voice information to obtain a first text;
a scene recognition unit 402, configured to perform scene recognition on the first text, and determine a target service scene corresponding to the first text, where the target service scene is used to represent a service type that needs to be provided and is expressed by the first text;
a scene related word extracting unit 403, configured to perform scene related word extraction on the first text to obtain a target scene related word corresponding to the first text, where the target scene related word is used to represent service content of the service type that needs to be provided and is expressed by the first text;
a scene hotword set query unit 404, configured to perform scene hotword set query according to the target service scene to obtain a target scene hotword set corresponding to the target service scene;
a comparison unit 405, configured to perform pinyin comparison on the target scene associated word and the scene hotword in the target scene hotword set to obtain a difference score between the target scene associated word and the scene hotword in the target scene hotword set, where the scene hotword is a word whose query hotness is greater than a hotness threshold;
a first determining unit 406, configured to determine a target scene hotword with a highest difference score in the target scene hotword set;
a replacing unit 407, configured to replace the target scene associated word in the first text with the target scene hotword to obtain a second text;
a second determining unit 408, configured to determine, according to the second text, a user intention expressed by the target speech information;
and the service unit 409 is configured to execute a corresponding service operation according to the determined user intention.
It can be understood that, since the method embodiment and the apparatus embodiment are different presentation forms of the same technical concept, the content of the method embodiment portion in the present application should be synchronously adapted to the apparatus embodiment portion, and is not described herein again.
In the case of using an integrated unit, as shown in fig. 5, fig. 5 is a block diagram of functional units of another speech recognition apparatus provided in the embodiment of the present application. In fig. 5, a speech recognition apparatus 510 includes: a processing module 512 and a communication module 511.
The processing module 512 is configured to invoke the human-computer interaction engine through the communication module 511 to interact with the user through the terminal device, and acquire target voice information input by the user in the interaction process; performing character recognition on the target voice information to obtain a first text; performing scene recognition on the first text, and determining a target service scene corresponding to the first text, wherein the target service scene is used for representing a service type which is expressed by the first text and needs to be provided; extracting scene associated words from the first text to obtain target scene associated words corresponding to the first text, wherein the target scene associated words are used for representing the service content of the service type required to be provided and expressed by the first text; performing scene hot word set query according to the target service scene to obtain a target scene hot word set corresponding to the target service scene; performing pinyin comparison on the target scene associated word and the scene hot word in the target scene hot word set to obtain a difference value score of the target scene associated word and the scene hot word in the target scene hot word set, wherein the scene hot word is a vocabulary with the heat degree greater than a heat degree threshold value, and the heat degree refers to the query heat degree of the vocabulary in all users; determining a target scene hot word with the highest difference value score in the target scene hot word set; replacing a target scene associated word in the first text with the target scene hot word to obtain a second text; determining the user intention expressed by the target voice information according to the second text; and executing corresponding service operation according to the determined user intention. For example, the processing module 512 executes some steps in the acquiring unit 401, the scene identifying unit 402, the scene related word extracting unit 403, the scene hotword set querying unit 404, the comparing unit 405, the first determining unit 406, the replacing unit 407, the second determining unit 408, and the service unit 409, and/or other processes for executing the techniques described herein. The communication module 511 is used to support interaction between the speech recognition apparatus 510 and other devices. As shown in fig. 5, the speech recognition device 510 may further include a storage module 513, and the storage module 513 is used for storing program codes and data of the speech recognition device 510.
The Processing module 512 may be a Processor or a controller, and may be, for example, a Central Processing Unit (CPU), a general-purpose Processor, a Digital Signal Processor (DSP), an ASIC, an FPGA or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. A processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, and the like. The communication module 511 may be a transceiver, an RF circuit or a communication interface, etc. The storage module 513 may be a memory.
All relevant contents of each scene related to the method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again. The speech recognition device 510 can perform the speech recognition method shown in fig. 2.
The above-described embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are generated in whole or in part when a computer instruction or a computer program is loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly. Computer-readable storage media can be any available media that can be accessed by a computer or a data storage device, such as a server, data center, etc., that contains one or more collections of available media. The available media may be magnetic media (e.g., floppy disk, hard disk, magnetic tape), optical media (e.g., DVD), or semiconductor media. The semiconductor medium may be a solid state disk.
Fig. 6 is a block diagram of an electronic device according to an embodiment of the present application. As shown in fig. 6, electronic device 600 may include one or more of the following components: a processor 601, a memory 602 coupled to the processor 601, wherein the memory 602 may store one or more programs, and the one or more programs may be configured to implement the methods described in the embodiments as described above when executed by the one or more processors 601. The electronic device 600 may be a server in the voice recognition system.
Processor 601 may include one or more processing cores. The processor 601 connects various parts throughout the electronic device 600 using various interfaces and lines, and performs various functions of the electronic device 600 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 602, and calling data stored in the memory 602. Alternatively, the processor 601 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 601 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a passenger interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the above modem may not be integrated into the processor 601, but may be implemented by a communication chip.
The Memory 602 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). The memory 602 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 602 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The stored data area may also store data created during use by the electronic device 600, and the like.
It is understood that the electronic device 600 may include more or less structural elements than those shown in the above structural block diagrams, for example, a power module, a physical button, a Wireless Fidelity (WiFi) module, a speaker, a bluetooth module, a sensor, etc., which are not limited herein.
Embodiments of the present application also provide a computer storage medium, in which a computer program/instructions are stored, and when executed by a processor, implement part or all of the steps of any one of the methods as described in the above method embodiments.
An embodiment of the present application further provides a computer program product, which includes a computer program/instruction, and when executed by a processor, the computer program/instruction implements the steps of the method according to the first aspect of the embodiment of the present application.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not imply any order of execution, and the order of execution of the processes should be determined by their functions and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and system may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative; for example, the division of the unit is only a logic function division, and there may be another division manner in actual implementation; for example, various elements or components may be combined or may be integrated in another system or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: u disk, removable hard drive, diskette, optical disk, volatile memory or non-volatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example and not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous SDRAM (SLDRAM), and direct bus RAM (DR RAM) among various media that can store program code.
Although the present invention is disclosed above, the present invention is not limited thereto. Any person skilled in the art can easily think of changes or substitutions without departing from the spirit and scope of the invention, and all changes and modifications can be made, including different combinations of functions, implementation steps, software and hardware implementations, all of which are included in the scope of the invention.

Claims (10)

1. A voice recognition method is applied to a server in a voice recognition system, the voice recognition system comprises the server and terminal equipment for voice interaction of a user, the server comprises a man-machine interaction engine supporting man-machine voice interaction, and the method comprises the following steps:
calling the human-computer interaction engine to interact with the user through the terminal equipment, and acquiring target voice information input by the user in the interaction process; performing character recognition on the target voice information to obtain a first text;
performing scene recognition on the first text, and determining a target service scene corresponding to the first text, wherein the target service scene is used for representing a service type which is expressed by the first text and needs to be provided;
extracting scene associated words from the first text to obtain target scene associated words corresponding to the first text, wherein the target scene associated words are used for representing the service content of the service type required to be provided and expressed by the first text;
performing scene hot word set query according to the target service scene to obtain a target scene hot word set corresponding to the target service scene;
performing pinyin comparison on the target scene associated word and the scene hot words in the target scene hot word set to obtain a difference value score between the target scene associated word and the scene hot words in the target scene hot word set, wherein the scene hot words are vocabularies with the heat degree greater than a heat degree threshold value, and the heat degree refers to the query heat degree of the vocabularies in all users;
determining a target scene hot word with the highest difference value score in the target scene hot word set;
replacing the target scene associated words in the first text with the target scene hot words to obtain a second text;
determining the user intention expressed by the target voice information according to the second text; and the number of the first and second groups,
and executing corresponding service operation according to the determined user intention.
2. The method according to claim 1, wherein the obtaining a difference score between the target scene associated word and the scene hotword in the target scene hotword set by performing pinyin comparison between the target scene associated word and the scene hotword in the target scene hotword set comprises:
determining whether a first vocabulary completely identical to the pinyin of the target scene associated word exists in the target scene hot word set;
if yes, determining whether the number of the first vocabulary is larger than 1;
if yes, determining whether the user inquires about the first vocabulary once;
if yes, determining whether the number of the second words inquired in the first words is larger than 1;
if yes, determining whether the time interval between the query time of the user for each second vocabulary and the current time is greater than a preset interval;
if yes, determining that the difference value score of the scene hot word with the largest query frequency in the second vocabulary is the highest;
if not, determining that the difference value score of the scene hot word with the shortest time interval between the query time and the current time in the second vocabulary is the highest.
3. The method according to claim 2, wherein after determining whether a first vocabulary completely identical to the pinyin of the target scene associated word exists in the target scene hot word set, if the first vocabulary does not exist, performing pinyin replacement on the pinyin of the target scene associated word to obtain a replaced pinyin; comparing the replaced pinyin with the scene hot words in the target scene hot word set to obtain a difference value score between the target scene associated word and the scene hot words in the target scene hot word set; and (c) a second step of,
after determining whether the number of the first vocabulary is greater than 1, if the number of the first vocabulary is equal to 1, determining that the difference score of the first vocabulary is the highest; and (c) a second step of,
after determining whether the user has queried the first vocabulary once, if the user has not queried the first vocabulary once, determining that the difference value score of the scene hot word with the highest degree in the first vocabulary is the highest; and (c) a second step of,
after determining whether the number of the second vocabulary inquired in the first vocabulary is greater than 1, if the number of the second vocabulary is equal to 1, determining that the difference score of the second vocabulary is the highest.
4. The method as claimed in claim 3, wherein the performing pinyin replacement on the pinyin of the target scene associated word to obtain the replaced pinyin comprises:
determining a native and/or living address of the user;
determining pronunciation characteristics corresponding to the native place and/or the life address;
determining the number of pinyin capable of being subjected to pinyin replacement in each pinyin corresponding to the target scene associated word according to the pronunciation characteristics;
and if the pinyin number is larger than 1, sequentially performing pinyin replacement according to the appearance sequence of each character needing pinyin replacement in the target scene associated word to obtain a plurality of replaced pinyins.
5. The method according to claim 4, wherein the comparing the replaced pinyin with the scene hotwords in the target scene hotword set to obtain a difference score between the target scene associated word and the scene hotwords in the target scene hotword set includes:
determining whether a target alternative pinyin which is identical to the pinyin of the scene hot word in the target scene hot word set exists in the multiple alternative pinyins;
if yes, determining the number of the target alternative pinyin;
if the number of the target alternative pinyins is 1, determining that the difference value score of the scene hot word corresponding to the target alternative pinyins is the highest;
if the number of the target alternative pinyins is at least two, calculating the difference value score of the scene hot word corresponding to each target alternative pinyins according to the number of the substituted pinyins in the target alternative pinyins and the use times or the heat of the scene hot word corresponding to the target alternative pinyins by the user.
6. The method of claim 5, wherein the calculating a difference score of the scene hotword corresponding to each target replacement pinyin according to the number of the replaced pinyins in the target replacement pinyin and the number of times or the degree of hotness of the user for using the scene hotword corresponding to the target replacement pinyin comprises:
determining a first pinyin with the least replaced pinyin in the target replaced pinyins, and determining whether the number of the first pinyins is more than 1;
if the number of the first pinyin is larger than 1, determining whether the user uses the scene hotword corresponding to the first pinyin;
if the user uses the scene hot word corresponding to the first pinyin, determining whether the number of third words used by the user in the scene hot word corresponding to the first pinyin is greater than 1;
if the number of the third vocabulary is larger than 1, determining that the difference value score of the scene hot word with the highest use frequency or the highest heat degree in the third vocabulary is the highest;
if the number of the third vocabulary is equal to 1, determining that the difference value score of the third vocabulary is the highest;
if the user does not use the scene hot word corresponding to the first pinyin, determining that the difference value score of the scene hot word with the highest heat in the scene hot words corresponding to the first pinyin is the highest;
and if the number of the first Pinyin is equal to 1, determining that the difference value score of the scene hotword corresponding to the first Pinyin is the highest.
7. The method of claim 4, wherein prior to determining the user's native and/or living address, the method further comprises:
obtaining the mandarin level of the user;
determining that the Mandarin level does not reach a preset level.
8. A speech recognition apparatus, for a server in a speech recognition system, the speech recognition system including a terminal device for performing speech interaction between the server and a user, the server including a human-machine interaction engine supporting human-machine speech interaction, the apparatus comprising:
the acquisition unit is used for calling the human-computer interaction engine to interact with the user through the terminal equipment, and acquiring target voice information input by the user in the interaction process; performing character recognition on the target voice information to obtain a first text;
a scene recognition unit, configured to perform scene recognition on the first text, and determine a target service scene corresponding to the first text, where the target service scene is used to represent a service type that needs to be provided and is expressed by the first text;
the scene associated word extracting unit is used for extracting a scene associated word from the first text to obtain a target scene associated word corresponding to the first text, wherein the target scene associated word is used for representing the service content of the service type required to be provided and expressed by the first text;
the scene hot word set query unit is used for carrying out scene hot word set query according to the target service scene to obtain a target scene hot word set corresponding to the target service scene;
the comparison unit is used for performing pinyin comparison on the target scene associated word and the scene hot words in the target scene hot word set to obtain a difference value score between the target scene associated word and the scene hot words in the target scene hot word set, wherein the scene hot words are words with the heat degree greater than a heat degree threshold value, and the heat degree refers to the query heat degree of the words in all users;
the first determining unit is used for determining a target scene hotword with the highest score of the difference value in the target scene hotword set;
the replacing unit is used for replacing the target scene associated words in the first text with the target scene hot words to obtain a second text;
a second determining unit, configured to determine, according to the second text, a user intention expressed by the target speech information; and (c) a second step of,
and the service unit is used for executing corresponding service operation according to the determined user intention.
9. An electronic device comprising a processor, memory, and one or more programs stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps of the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program/instructions is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202211487069.2A 2022-11-25 2022-11-25 Speech recognition method and related product Active CN115547337B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211487069.2A CN115547337B (en) 2022-11-25 2022-11-25 Speech recognition method and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211487069.2A CN115547337B (en) 2022-11-25 2022-11-25 Speech recognition method and related product

Publications (2)

Publication Number Publication Date
CN115547337A true CN115547337A (en) 2022-12-30
CN115547337B CN115547337B (en) 2023-03-03

Family

ID=84719741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211487069.2A Active CN115547337B (en) 2022-11-25 2022-11-25 Speech recognition method and related product

Country Status (1)

Country Link
CN (1) CN115547337B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860823A (en) * 2023-03-03 2023-03-28 深圳市人马互动科技有限公司 Data processing method in human-computer interaction questionnaire answering scene and related product

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106030699A (en) * 2014-10-09 2016-10-12 谷歌公司 Hotword detection on multiple devices
CN109346060A (en) * 2018-11-28 2019-02-15 珂伯特机器人(天津)有限公司 Audio recognition method, device, equipment and storage medium
CN109920432A (en) * 2019-03-05 2019-06-21 百度在线网络技术(北京)有限公司 A kind of audio recognition method, device, equipment and storage medium
US20200035217A1 (en) * 2019-08-08 2020-01-30 Lg Electronics Inc. Method and device for speech processing
CN111292745A (en) * 2020-01-23 2020-06-16 北京声智科技有限公司 Method and device for processing voice recognition result and electronic equipment
CN113160822A (en) * 2021-04-30 2021-07-23 北京百度网讯科技有限公司 Speech recognition processing method, speech recognition processing device, electronic equipment and storage medium
CN113223516A (en) * 2021-04-12 2021-08-06 北京百度网讯科技有限公司 Speech recognition method and device
US20220092276A1 (en) * 2020-09-22 2022-03-24 Samsung Electronics Co., Ltd. Multimodal translation method, apparatus, electronic device and computer-readable storage medium
US20220165277A1 (en) * 2020-11-20 2022-05-26 Google Llc Adapting Hotword Recognition Based On Personalized Negatives

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106030699A (en) * 2014-10-09 2016-10-12 谷歌公司 Hotword detection on multiple devices
CN109346060A (en) * 2018-11-28 2019-02-15 珂伯特机器人(天津)有限公司 Audio recognition method, device, equipment and storage medium
CN109920432A (en) * 2019-03-05 2019-06-21 百度在线网络技术(北京)有限公司 A kind of audio recognition method, device, equipment and storage medium
US20200035217A1 (en) * 2019-08-08 2020-01-30 Lg Electronics Inc. Method and device for speech processing
CN111292745A (en) * 2020-01-23 2020-06-16 北京声智科技有限公司 Method and device for processing voice recognition result and electronic equipment
US20220092276A1 (en) * 2020-09-22 2022-03-24 Samsung Electronics Co., Ltd. Multimodal translation method, apparatus, electronic device and computer-readable storage medium
US20220165277A1 (en) * 2020-11-20 2022-05-26 Google Llc Adapting Hotword Recognition Based On Personalized Negatives
CN113223516A (en) * 2021-04-12 2021-08-06 北京百度网讯科技有限公司 Speech recognition method and device
CN113160822A (en) * 2021-04-30 2021-07-23 北京百度网讯科技有限公司 Speech recognition processing method, speech recognition processing device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860823A (en) * 2023-03-03 2023-03-28 深圳市人马互动科技有限公司 Data processing method in human-computer interaction questionnaire answering scene and related product

Also Published As

Publication number Publication date
CN115547337B (en) 2023-03-03

Similar Documents

Publication Publication Date Title
US10417344B2 (en) Exemplar-based natural language processing
CN110069608B (en) Voice interaction method, device, equipment and computer storage medium
US9930167B2 (en) Messaging application with in-application search functionality
US20190147052A1 (en) Method and apparatus for playing multimedia
US20180286459A1 (en) Audio processing
US8170537B1 (en) Playing local device information over a telephone connection
US20110167350A1 (en) Assist Features For Content Display Device
US10313713B2 (en) Methods, systems, and media for identifying and presenting users with multi-lingual media content items
US20130268826A1 (en) Synchronizing progress in audio and text versions of electronic books
US20240070217A1 (en) Contextual deep bookmarking
CN112102841B (en) Audio editing method and device for audio editing
AU2006325555B2 (en) A method and apparatus for accessing a digital file from a collection of digital files
CN105912586B (en) Information searching method and electronic equipment
KR101567449B1 (en) E-Book Apparatus Capable of Playing Animation on the Basis of Voice Recognition and Method thereof
CN115547337B (en) Speech recognition method and related product
KR102353797B1 (en) Method and system for suppoting content editing based on real time generation of synthesized sound for video content
CN105684012B (en) Providing contextual information
CN112825088A (en) Information display method, device, equipment and storage medium
CN113360127B (en) Audio playing method and electronic equipment
JP7229296B2 (en) Related information provision method and system
US20140297285A1 (en) Automatic page content reading-aloud method and device thereof
CN115687807A (en) Information display method, device, terminal and storage medium
CN112837668B (en) Voice processing method and device for processing voice
CN114630179A (en) Audio extraction method and electronic equipment
JP7562610B2 (en) Content editing support method and system based on real-time generation of synthetic sound for video content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant