CN118155613A - Voice processing method, device, equipment and medium - Google Patents

Voice processing method, device, equipment and medium Download PDF

Info

Publication number
CN118155613A
CN118155613A CN202211550685.8A CN202211550685A CN118155613A CN 118155613 A CN118155613 A CN 118155613A CN 202211550685 A CN202211550685 A CN 202211550685A CN 118155613 A CN118155613 A CN 118155613A
Authority
CN
China
Prior art keywords
service type
semantic
service
determining
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211550685.8A
Other languages
Chinese (zh)
Inventor
张斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Co Wheels Technology Co Ltd
Original Assignee
Beijing Co Wheels Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Co Wheels Technology Co Ltd filed Critical Beijing Co Wheels Technology Co Ltd
Priority to CN202211550685.8A priority Critical patent/CN118155613A/en
Publication of CN118155613A publication Critical patent/CN118155613A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the disclosure relates to a voice processing method, a device, equipment and a medium, wherein the method comprises the following steps: responding to the received voice control instruction, determining a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object; under the condition that the first service types are a plurality of, determining a second service type of the application program in the current opening state, and matching the second service type with the first service type to obtain a matching result; and when the matching result is that the first target service type successfully matched with the first service type exists in the second service type, providing the voice service corresponding to the semantic service object through the application program corresponding to the first target service type. In the embodiment of the disclosure, under the condition that multiple semantics exist, correct semantics can be determined in the multiple semantics according to the service type of the opened application program, so that interaction with a user for semantic clarification is reduced, time consumption for semantic clarification is reduced, and voice processing efficiency is improved.

Description

Voice processing method, device, equipment and medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, and in particular relates to a voice processing method, device, equipment and medium.
Background
Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence, research in this area including robotics, speech recognition, and the like. With the development of artificial intelligence, speech recognition processing is performed based on the artificial intelligence, and the method becomes a common interaction mode.
In the related art, when multiple semantics appear in the voice control command of the user, a clarification technology is mostly adopted, and the user selects the correct semantics to complete voice processing by interacting with the user multiple times, for example, when the voice control command of the user is "play hurry year", the corresponding semantic service object may be of a song type or a video type, so that the user needs to interact with the user again to perform semantic clarification, for example, ask the user whether "you want to listen to a hurry year song or a video", etc., when the user confirms "want to listen to a song", the final semantic recognition result is determined to be "play hurry year song", and the time consumed for interacting with the user to obtain the correct semantic process is low in voice processing efficiency.
Disclosure of Invention
In order to solve the above technical problems or at least partially solve the above technical problems, the present disclosure provides a voice processing method, apparatus, device, and medium, where when a voice service object corresponding to a voice control instruction has multiple service types, an application corresponding to the voice control instruction is determined by matching a semantic service object corresponding to the voice control instruction with a service type of an opened application, so as to provide voice service based on the corresponding application, reduce interaction with a user for semantic clarification, reduce time consumption for semantic clarification, and improve voice processing efficiency.
The embodiment of the disclosure provides a voice processing method, which comprises the following steps: responding to a received voice control instruction, determining a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object; determining a second service type of an application program in a current starting state under the condition that the first service type is a plurality of service types, and matching the second service type with the first service type; and providing the voice service corresponding to the semantic service object through the application program corresponding to the first target service type under the condition that the first target service type successfully matched with the first service type exists in the second service type.
The embodiment of the disclosure also provides a voice processing device, which comprises: the determining module is used for responding to the received voice control instruction and determining a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object; the matching module is used for determining a second service type of the application program in the current opening state under the condition that the first service types are multiple, and matching the second service type with the first service type; and the processing module is used for providing the voice service corresponding to the semantic service object through the application program corresponding to the first target service type under the condition that the first target service type successfully matched with the first service type exists in the second service type.
The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement a voice processing method as provided in an embodiment of the disclosure.
The present disclosure also provides a computer-readable storage medium storing a computer program for executing the voice processing method as provided by the embodiments of the present disclosure.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
according to the voice processing scheme provided by the embodiment of the disclosure, a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object are determined in response to the received voice control instruction, a second service type of an application program in an on state is determined under the condition that the first service types are multiple, the second service type is matched with the first service type, and further, voice service corresponding to the semantic service object is provided through an application program corresponding to the first target service type under the condition that a first target service type successfully matched with the first service type exists in the second service type. In the embodiment of the disclosure, when a plurality of service types exist in the voice service object corresponding to the voice control instruction, the application program corresponding to the voice control instruction is determined by matching the semantic service object corresponding to the voice control instruction with the service type of the opened application program so as to provide voice service based on the corresponding application program, thereby reducing interaction with a user for semantic clarification, reducing time consumption for semantic clarification and improving voice processing efficiency.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
Fig. 1 is a schematic flow chart of a voice processing method according to an embodiment of the disclosure;
FIG. 2 is a flowchart of another speech processing method according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating another speech processing method according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a speech processing scenario provided in an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of another speech processing scenario provided by an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of another speech processing scenario provided by an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a speech processing device according to an embodiment of the disclosure;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
In order to solve the above-mentioned problems, an embodiment of the present disclosure provides a voice processing method, in which when an application program is started, an application program corresponding to a voice control instruction is determined according to that a semantic service object corresponding to the voice control instruction is matched with a service type of the started application program, so that voice service is provided based on the corresponding application program, interaction with a user for semantic clarification is reduced, and voice processing efficiency is improved. The method is described below in connection with specific examples.
Fig. 1 is a schematic flow chart of a voice processing method provided in an embodiment of the present disclosure, where the method may be performed by a voice processing apparatus, where the apparatus may be implemented by software and/or hardware, and may generally be integrated in an electronic device, where the electronic device may be integrated on a device such as a vehicle, etc., so that the voice processing method in the embodiment may also provide a voice processing service on the vehicle. As shown in fig. 1, the method includes:
in step 101, in response to the received voice control instruction, a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object are determined.
The semantic service object may be understood as a control object corresponding to the voice control instruction, for example, when the voice control instruction is "play hurry year", the corresponding semantic service object is "hurry year", etc.
It should be noted that, in some possible embodiments, the voice information included in the voice control instruction may be converted into text information through a voice recognition technology, the part of speech of each word in the text information is recognized, and a semantic service object is determined according to the recognition result, for example, a noun after the part of speech is determined to be a verb is used as a voice service object;
in some possible embodiments, the deep learning model may also be trained according to a large amount of sample data, the voice information contained in the voice control instruction is converted into text information through a voice recognition technology, and the text information is input into the deep learning model trained in advance, so as to obtain a semantic service object output by the deep learning model, and the like.
Further, a first service type of the voice service object is determined, wherein the first service type is used for indicating possible information types given to the semantic service object, for example, when the semantic service object is the "hurry year", the "hurry year" corresponds to the first service type including the "song" and the "video" because the "hurry year" corresponds to the information type which is possible to be the "song" and also possible to be the "video".
It should be noted that, in some possible embodiments, the semantic recognition is performed on the voice control instruction to determine the semantic service object corresponding to the voice control instruction object, where the manner of semantic recognition may refer to the above embodiments, further, the semantic recognition may send a query request carrying the semantic service object to a preset server, and obtain the first service type of the semantic service object fed back by the preset server, where the preset server may determine, according to the semantic service object, the first service type corresponding to the voice service object in on-line data or in pre-stored off-line data.
In some possible embodiments, the search result corresponding to the semantic service object may be searched in the preset search platform, the number of times of each candidate service type is determined according to the search result, and the number of times of determining that the candidate service type with the number of times being the first service type is determined, where in the actual execution process, the number of times of determining that the candidate service type with the number of times being the previous preset number is further determined, and the ratio of the number of times of determining that the candidate service type with the number of times being the previous preset number is the first service type is further determined, when the ratio is smaller than a preset ratio threshold, the candidate service type of the object is not used as the first service type, and when the ratio corresponding to all the number of times of candidate service types with the number of times being the previous preset number of candidate service types is not greater than the preset ratio threshold, the candidate service object with the highest number of times is used as the first service type.
Step 102, in the case that the first service type is a plurality of service types, determining a second service type of the application program in the current on state, and matching the second service type with the first service type to obtain a matching result.
In one embodiment of the present disclosure, in the case that the first service type is plural, semantic clarification is not directly performed, but a second service type of an application currently in an on state is determined, for example, a second service type of an in-vehicle application currently in an on state is determined, where the second service type indicates an application service type of the application, for example, a song playing application, a corresponding second service type is "song" for the song playing application, for example, a second service type of an object is "video" for the video playing application, and so on.
In some possible embodiments, a program name of the application program currently in the on state may be obtained, and a pre-built correspondence is queried to obtain a second service type corresponding to the program name.
In this embodiment, after determining the second service type, the service type is further matched with the first service type, so as to determine whether there is an application program that can directly provide the service corresponding to the voice control instruction.
And step 103, when the matching result is that the first target service type successfully matched with the first service type exists in the second service type, providing the voice service corresponding to the semantic service object through the application program corresponding to the first target service type.
In one embodiment of the present disclosure, when the matching result is that the first target service type successfully matched with the first service type exists in the second service type, the voice service corresponding to the semantic service object is provided through the application program corresponding to the first target service type, that is, in the currently opened application program, the first target service type provided by the second service type matched with the first service type of the semantic service object exists, and at this time, the voice service is directly provided through the corresponding application program, so that not only is the application program of the voice service determined directly in the already opened application program, but also voice processing can be directly provided through the first target application program, at this time, semantic clarification is not required, the interaction times of the semantic clarification are reduced, and the efficiency of the voice processing is improved.
The application program corresponding to the first target service type in the embodiment of the present disclosure provides a voice service corresponding to the semantic service object, determines a multimedia resource corresponding to the semantic service object for the application program corresponding to the first target service type, and plays the multimedia resource according to the application program corresponding to the first target service type. For example, when the semantic service object is "hurry year", the corresponding plurality of first service types are "song" and "video", and the second service type of the opened application program is "video", "map", it is obvious that since the second service type "video" is consistent with the first service type "video", the playing service of "hurry year" can be provided directly through the "video" application.
Of course, in the actual execution process, there may be a plurality of first target service types, and in this case, the open time of the application program corresponding to each first target service type may be determined, and then the application program corresponding to the first target service type with the closest open time may be determined as the application program for providing the voice service; or the preference information of the user can be obtained, and the application program corresponding to the first target service type which is most suitable for the preference information of the user is determined as the application program for providing the voice service in a plurality of first target service types according to the preference information of the user.
In summary, in the voice processing method according to the embodiment of the present disclosure, in response to a received voice control instruction, a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object are determined, in the case that the first service type is a plurality of service types, a second service type of an application program currently in an on state is determined, and the second service type is matched with the first service type, and further, in the case that a first target service type successfully matched with the first service type exists in the second service type, a voice service corresponding to the semantic service object is provided through an application program corresponding to the first target service type. In the embodiment of the disclosure, when a plurality of service types exist in the voice service object corresponding to the voice control instruction, the application program corresponding to the voice control instruction is determined by matching the semantic service object corresponding to the voice control instruction with the service type of the opened application program, so that voice service is provided based on the corresponding application program, namely under the condition that multiple semantics exist, correct semantics can be determined in the multiple semantics according to the service type of the opened application program, interaction with a user for semantic clarification is reduced, time consumption for semantic clarification is reduced, and voice processing efficiency is improved.
Based on the above embodiments, after matching the second service type with the first service type, there may be other matching results, for which further semantic clarification processing is performed in the embodiments of the present disclosure.
In one embodiment of the present disclosure, as shown in fig. 2, after matching the second service type with the first service type, further includes:
Step 201, when the matching result is that the first target service type successfully matched with the first service type does not exist in the second service type, generating semantic clarification prompt information according to a plurality of first service types.
In an embodiment of the disclosure, when the matching result is that the first target service type successfully matched with the first service type does not exist in the second service type, a semantic clarification prompt message is generated according to the first service type, wherein the semantic clarification prompt message generally comprises a semantic service object and a plurality of first service types corresponding to the semantic service object.
For example, continuing to take the semantic service object as an example of "hurry year", when the second service type does not have the first target service type successfully matched with the first service type, the voice clarification prompt information generated according to the plurality of first service types of "songs" and "videos" may be "you want to listen to songs in hurry year or want to see videos in hurry year".
And 202, carrying out semantic clarification prompt processing according to the semantic clarification prompt information.
It should be noted that, in different application scenarios, the manner of performing the semantic clarification prompt processing according to the semantic clarification prompt information is different, for example, the semantic clarification prompt information may be played by adopting the semantic; for example, whether the current user is using the screen can be detected, and when the user is using the screen, semantic clarification prompt processing and the like can be performed in a manner of popup text display of semantic clarification prompt information.
Further, in response to receiving a second target service type input according to the semantic clarification prompt information within a preset time, determining an application program corresponding to the second target service type, wherein the manner of inputting the second target service type may be voice input or input by triggering a label and the like of the first service type contained in the corresponding popup text.
Furthermore, in order to avoid that the running safety is not affected by the application program corresponding to the opened second target service type, in the embodiment of the present disclosure, the current driving parameter information of the vehicle is also identified, where the driving parameter information includes driving speed information, driving road type information, and the like, the current driving safety level is determined according to the current driving parameter information, for example, the current driving parameter information is input into a pre-trained deep learning model to obtain the current driving safety level obtained by the deep learning model, further, the program safety level of the application program corresponding to the second target service type may be obtained by querying a preset correspondence and the like, when the application program safety level matches with the current driving safety level, for example, when the program safety level is greater than or equal to the current driving safety level, it is determined that the application program corresponding to the second target service type is running currently and does not bring a potential safety hazard to running safety, in this case, a voice service corresponding to the semantic service object is provided by the application program corresponding to the second target service type, that is started, and a multimedia resource corresponding to the service object is obtained by the application program corresponding to the second target service type. Therefore, in this embodiment, the semantic clarification processing is performed only when the first target service type successfully matched with the first service type does not exist in the second service type, and the semantic clarification processing is not performed when the first target service type successfully matched with the first service type exists in the second service type, so that the interaction times during the semantic clarification are reduced.
In an embodiment of the present disclosure, it is also possible that the second target service type input according to the semantic clarification prompt information is not received within a preset time period, so as to improve the intelligence of the speech processing, the heat information of each first service type may also be determined, where the heat information may be determined according to the historical search times corresponding to each first service type of the semantic service object by the user, where the historical search times are in a proportional relationship with the heat information, or the play times of the multimedia resource corresponding to each first service type of the semantic service object may also be determined, and the heat information of each first service type may also be determined according to the play coefficient, where the heat information is in a proportional relationship with the play times.
Further, a third target service type is determined in the first service types according to the heat information, and an application program corresponding to the third target service type is determined, wherein the third target service type is usually the service type with the highest heat information, and further, the application program corresponding to the third target service type is started to provide voice service corresponding to the semantic service object through the application program corresponding to the third target service type.
In one embodiment of the present disclosure, in the case that the first service type is single, determining a fourth target service type corresponding to the first service type, and determining an application program corresponding to the fourth target service type, and providing a voice service corresponding to the semantic service object through the application program corresponding to the fourth target service type, where the fourth target service type may be an application program that has been opened currently, and may be an application program that has not been opened, without limitation, that is, in the case that only one of the first service types is provided, directly opening the corresponding application program to provide the voice service, without matching the service type of the opened application program with the first service type, thereby further improving the voice processing efficiency.
In summary, in the voice processing method of the embodiment of the present disclosure, under the condition that a first target service type successfully matched with a first service type does not exist in a second service type, semantic clarification prompt information is generated according to a plurality of first service types, and semantic clarification prompt processing is performed according to the semantic clarification prompt information. And the voice processing efficiency and the reliability of voice processing are both considered.
Based on the above embodiment, it is easy to understand that, when determining the semantic service object and the first service type corresponding to the voice control instruction, in addition to the semantic information of the semantic recognition result corresponding to the semantic control instruction, the relation between semantic terms in the semantic recognition result may be mined, and the semantic service object and the first service type corresponding to the voice control instruction are determined based on the relation between the semantic terms, thereby improving the recognition accuracy of the semantic service object and the first service type corresponding to the voice control instruction.
In one embodiment of the present disclosure, as shown in fig. 3, determining a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object includes:
Step 301, performing semantic recognition according to the voice control instruction to obtain a target semantic recognition result.
In one embodiment of the present disclosure, semantic recognition is performed according to a voice control instruction to obtain a target semantic recognition result, where a style of the semantic recognition result is a semantic recognition text.
Step 302, identifying a plurality of semantic terms contained in the target semantic recognition result.
In this embodiment, a plurality of semantic terms included in the target semantic recognition result may be recognized according to a term part-of-speech recognition method or the like, for example, when the target semantic recognition result is "i am, i am).
Step 303, constructing a word segmentation tag unit set according to the plurality of semantic words, wherein the word segmentation tag unit set is a unit set comprising line segmentation words and column segmentation words, and the line segmentation words and the column segmentation words are obtained by carrying out the same setting according to the word segmentation sequence of the plurality of semantic words in the target semantic recognition result.
In the embodiment of the disclosure, in order to mine the relation between the semantic terms, a term label unit set is constructed according to a plurality of semantic terms, wherein the term label unit set is a unit set comprising line terms and column terms, that is, if a plurality of speech terms correspond to n terms, a square table composed of n terms can be constructed, the table comprises n x n unit cells, the n x n unit cells serve as sub-units to form a corresponding term label unit set, wherein the line terms and the column terms are arranged in the same order according to the terms in the target semantic recognition result, the same arrangement is that the line terms and the column terms are arranged in the order from front to back in the corresponding target semantic recognition result, or the line terms and the column terms are arranged in the order from back to front in the corresponding target semantic recognition result, that is, the order of corresponding characters arranged in the corresponding sample sentences is required to be guaranteed to be the same, the line terms and the column terms are guaranteed to be the same, the line terms are guaranteed to be the same in the corresponding target recognition result, if the line terms are the corresponding table is in a hasty with the same, i.e. the line terms are in a plurality of 5, if the line terms are in a hasty, and the line terms are in a hasty with a number of 5, and a word is shown in a word of' 39year, if the table is constructed, and if the line terms are shown, and the table is in a word is shown, and a 5, and a word is shown in a word is shown, and a 5 is shown in a word of a 5, and a word is shown, and a 5 is shown in a 5, and a 5 of a 5.
Step 304, labeling corresponding information category labels on all sub-units consisting of line segmentation and column segmentation in the segmentation label unit set.
Each sub-unit consisting of a row word and a column word in the word segmentation tag unit set is labeled with a corresponding information category tag, so that each word segmentation tag unit set contains two dimensions of information, one dimension is the information category of the semantic word (the semantic word in the table is all the semantic word without the prior extraction of the semantic word, whether the semantic word has the corresponding information category or not is reflected in the table), and the other dimension is the information category between the semantic word.
In some possible embodiments, a first word segmentation attribute and a second word segmentation attribute of a line word segmentation corresponding to each subunit in the word segmentation tag unit set may be determined, the first word segmentation attribute and the second word segmentation attribute are input into a preset word segmentation relation extraction model to obtain information category tags of the corresponding subunits, where the word segmentation relation extraction model pre-trains relation extraction of word segmentation attributes with semantic execution meanings (related to service objects and service types), for example, in some scenes, word segmentation relation extraction is not performed for some word segmentation attributes without semantic execution meanings, for example, the output result is "no information category" for the word segmentation relation between the word segmentation attributes such as "want" and "want".
For example, continuing to take the scenario shown in fig. 4 as an example, as shown in fig. 5, corresponding information category labels are marked in the cells corresponding to the characters of the corresponding rows and columns, the information category labels corresponding to the "no information category" are marked in the cells without the corresponding information category (in the embodiment of the present disclosure, "no information category" may be represented by "no information category", the "no information category" indicates that the determined relationship between the corresponding row semantic terms and column semantic terms for identifying the semantic service object and the semantic service type is smaller), the information category labels marked by different cells may be the same or may be different, where, for example, the information categories marked by the "i" and "i" are "no information category", the information categories marked by the cells corresponding to today and today are "no information category", the information categories marked by the cells corresponding to me "and" play "are" subject language-noun ", the information categories corresponding to" i "and" in a hurry "are" and so on.
Therefore, besides the semantic word labeling corresponding to the cell with the specific information category, the semantic word labeling corresponding to the cell without the information category can be embodied, so that the corresponding semantic word with the information category does not need to be extracted in advance, and all the semantic words in each semantic word and related sentences are traversed based on the labeling of the information category label of the label table, thereby ensuring the accuracy of information category extraction.
Step 305, determining a semantic service object and a first service type of the semantic service object according to the information category labels in the word segmentation label unit set.
In an embodiment of the present disclosure, when determining a set of word segmentation tag units, a semantic service object and a first service type of the semantic service object are determined according to information category tags in the set of word segmentation tag units.
It should be noted that, because the information tag types in the word segmentation tag unit set reflect the relation between semantic words, the relation between semantic words can be combined to quickly acquire some semantic commands to execute meaningful word segmentation, for example, the corresponding semantic words corresponding to the information types such as "noun-noun" can be determined to determine the corresponding semantic service object, and the corresponding first service type can be determined according to the semantic words corresponding to the information types such as "noun-verb", "noun-noun", and the like.
In some possible embodiments, it may be appreciated that, since the table construction corresponding to the word segmentation tag unit set has some table attribute information, for example, when the line segmentation and the column segmentation are identical, the constituent segmentation words are generally distributed symmetrically along a diagonal line of the table, for example, the information category related to the semantic service object is generally distributed along the diagonal line of the table, and thus, the information category related to identifying the semantic service object and the service type is further determined in combination with the attribute information of such table. In this embodiment, the table feature vector corresponding to the table attribute information is determined according to the word segmentation tag unit set, for example, a vector of a semantic word pair formed by corresponding line segmentation and column segmentation under each information category corresponding to the word segmentation tag unit set may be extracted, and a head vector of the semantic word pair may be extracted by using a multi-layer perceptron or the like according to the vector of the semantic word pairAnd tail vector/>Wherein in the present embodiment, the head vector/>And tail vector/>The extraction formula of (1) is shown as formula (1), wherein in formula (1)/> Representing the real number field, d is the dimension of the feature vector,
Furthermore, the head vector and the tail vector of the semantic word pair are combined into a combined vector, the combined vector of all the semantic words is used as a corresponding table feature vector, and because the table feature vector has a relation with the information type label, the corresponding first service type or the information type label of the semantic service object can be mined, for example, the corresponding semantic service object can be determined through noun-noun and verb-noun information types, and the like, and the table feature vectors obtained by the semantic words corresponding to the information types are relatively consistent, therefore, the table segmentation units corresponding to the information types can be screened out through the table feature vector, the information label corresponding to each table segmentation unit belongs to a large class, the corresponding semantic service object or the first service type can be determined in a mutual cooperation manner, therefore, the table segmentation position of the word label unit set is determined, the word label is segmented according to the table segmentation position, so as to obtain a plurality of table segmentation units, and whether the corresponding semantic service object can be better determined by the information type label corresponding to each table segmentation unit.
For example, as shown in fig. 6, taking the scenario shown in fig. 5 as an example, the table may be divided into 6 table division units according to the table feature vector, where the information category in each table division unit belongs to a major class, where, for example, the information category label under the first table division unit belongs to a major class, which is irrelevant to determining the service type and the semantic service object.
Further, the first prediction score belonging to the semantic service object and the second prediction score belonging to the first service type in each table segmentation unit can be determined according to the information category label contained in each table segmentation unit, so that the semantic service object and the first service type can be determined according to the first prediction score and the second prediction score.
In some possible embodiments, a first number of first preset information categories belonging to the semantic service object in the information category labels included in each table segmentation unit may be determined, a first prediction score is determined according to the first number, a second number of second preset information categories belonging to the service type in the information category labels included in each table segmentation unit is determined, and a second prediction score is determined according to the second number; in some possible embodiments, the corresponding table feature vector in each table division unit may be input into a preset convolutional neural network, so as to obtain the first prediction score belonging to the semantic service object in each table division unit and the second prediction score belonging to the first service type, further, the semantic service object and the first service type are determined according to the first prediction score and the second prediction branch, for example, if the first prediction score is greater than the first preset score, the noun in the semantic word corresponding to the table division unit may be used as the first service object, if the second prediction score is greater than the second preset score, the first candidate service type corresponding to the verb in the corresponding semantic word in the corresponding table division unit is identified, and the second candidate service type corresponding to the noun in the corresponding semantic word in the corresponding table division unit is identified, and the intersection of the first candidate service type and the second candidate service type is used as the first service type, for example, if the first candidate type determined according to the corresponding semantic word in the table division unit is the first candidate type and the corresponding candidate word in the corresponding word is the first song type in the video and the corresponding word in a hurry.
In summary, according to the voice processing method of the embodiment of the disclosure, semantic recognition is performed according to the voice control instruction to obtain a target semantic recognition result, a plurality of semantic terms contained in the target semantic recognition result are recognized, a semantic service object and a first service type of the semantic service object are determined based on information categories among the plurality of semantic terms, the semantic service object and the first service type of the semantic service object are improved, the determination refinement of the semantic service object and the first service type of the semantic service object is improved, and the determination accuracy of the semantic service object and the first service type of the semantic service object is ensured. In order to implement the above embodiment, the present disclosure also proposes a speech processing apparatus.
Fig. 7 is a schematic structural diagram of a speech processing apparatus according to an embodiment of the present disclosure, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 7, the apparatus includes: a determination module 710, a matching module 720, and a processing module 730, wherein,
A determining module 710, configured to determine, in response to a received voice control instruction, a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object;
a matching module 720, configured to determine a second service type of the application currently in the on state when the first service type is plural, and match the second service type with the first service type to obtain a matching result;
and the processing module 730 is configured to provide, when the matching result is that the second service type has the first target service type that is successfully matched with the first service type, a voice service corresponding to the semantic service object through an application program corresponding to the first target service type.
In one embodiment of the present disclosure, the determining module 710 is specifically configured to:
performing semantic recognition on the voice control instruction to determine a semantic service object corresponding to the voice control instruction object;
and sending a query request carrying the semantic service object to a preset server, and acquiring a first service type of the semantic service object fed back by the preset server.
The voice processing device provided by the embodiment of the disclosure can execute the voice processing method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
To achieve the above embodiments, the present disclosure also proposes a computer program product comprising a computer program/instruction which, when executed by a processor, implements the speech processing method in the above embodiments.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Referring now in particular to fig. 8, a schematic diagram of an electronic device 800 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device 800 in the embodiments of the present disclosure may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, as well as stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 8 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 8, the electronic device 800 may include a processor (e.g., a central processing unit, a graphics processor, etc.) 801 that may perform various appropriate actions and processes according to programs stored in a Read Only Memory (ROM) 802 or programs loaded from a memory 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic device 800 are also stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
In general, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, etc.; memory 808 including, for example, magnetic tape, hard disk, etc.; communication means 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 shows an electronic device 800 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 809, or from memory 808, or from ROM 802. The above-described functions defined in the voice processing method of the embodiment of the present disclosure are performed when the computer program is executed by the processor 801.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: in response to the received voice control instruction, determining a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object, determining a second service type of an application program in a current starting state under the condition that the first service types are multiple, matching the second service type with the first service type, and further providing voice service corresponding to the semantic service object through the application program corresponding to the first target service type under the condition that a first target service type successfully matched with the first service type exists in the second service type. In the embodiment of the disclosure, when a plurality of service types exist in the voice service object corresponding to the voice control instruction, the application program corresponding to the voice control instruction is determined by matching the semantic service object corresponding to the voice control instruction with the service type of the opened application program, so that voice service is provided based on the corresponding application program.
The electronic device may write computer program code for performing the operations of the present disclosure in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims (10)

1. A method of speech processing comprising the steps of:
Responding to a received voice control instruction, determining a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object;
under the condition that the first service types are multiple, determining a second service type of the application program in the current opening state, and matching the second service type with the first service type to obtain a matching result;
And when the matching result is that a first target service type successfully matched with the first service type exists in the second service type, providing the voice service corresponding to the semantic service object through an application program corresponding to the first target service type.
2. The method of claim 1, wherein the determining the semantic service object corresponding to the voice control instruction and the first service type of the semantic service object comprises:
carrying out semantic recognition according to the voice control instruction to obtain a target semantic recognition result;
identifying a plurality of semantic terms contained in the target semantic identification result;
Constructing a word segmentation tag unit set according to the plurality of semantic words, wherein the word segmentation tag unit set is a unit set consisting of line segmentation and column segmentation, and the line segmentation and the column segmentation are obtained by the same setting according to the word segmentation sequence of the plurality of semantic words in the target semantic recognition result;
Labeling corresponding information category labels for all sub-units consisting of the line segmentation and the column segmentation in the segmentation label unit set;
And determining the semantic service object and the first service type of the semantic service object according to the information category label in the word segmentation label unit set.
3. The method as claimed in claim 2, wherein said labeling each sub-unit consisting of the line segmentation and the column segmentation in the segmented label unit set with a corresponding information category label includes:
determining a first word segmentation attribute of a line word segmentation and a second word segmentation attribute of a column word segmentation corresponding to each subunit in the word segmentation tag unit set;
inputting the first word segmentation attribute and the second word segmentation attribute into a preset word segmentation relation extraction model to obtain information category labels of corresponding subunits.
4. A method according to claim 2 or 3, wherein said determining said semantic service object and a first service type of said semantic service object based on said information category labels in said set of word segmentation label units comprises:
identifying form attribute information of the word segmentation tag unit set, and determining form feature vectors corresponding to the form attribute information according to the word segmentation tag unit set; determining a table segmentation position of the word segmentation tag unit set according to the table feature vector, and carrying out segmentation processing on the word segmentation tag according to the table segmentation position so as to obtain a plurality of table segmentation units;
Determining a first predictive score belonging to the semantic service object and a second predictive score belonging to the first service type in each table segmentation unit according to the information category labels contained in each table segmentation unit;
and determining the semantic service object and the first service type according to the first predictive score and the second predictive score respectively.
5. The method of claim 1, further comprising, after said matching said second service type with said first service type to obtain a matching result:
when the matching result is that a first target service type successfully matched with the first service type does not exist in the second service type, generating semantic clarification prompt information according to a plurality of first service types;
And carrying out semantic clarification prompt processing according to the semantic clarification prompt information.
6. The method as recited in claim 5, further comprising:
determining an application program corresponding to a second target service type received within a preset time period according to the second target service type input by the semantic clarification prompt information;
Identifying current driving parameter information of a vehicle, determining a current driving safety level according to the current driving parameter information, and determining a program safety level of an application program corresponding to the second target service type;
and when the program security level is matched with the current driving security level, providing the voice service corresponding to the semantic service object through the application program corresponding to the second target service type.
7. The method of claim 5 or 6, further comprising:
Determining the heat information of each first service type in response to the fact that the second target service type input according to the semantic clarification prompt information is not received within the preset time period;
determining a third target service type in the first service types according to the heat information, and determining an application program corresponding to the third target service type;
and starting the application program corresponding to the third target service type to provide the voice service corresponding to the semantic service object through the application program corresponding to the third target service type.
8. A speech processing apparatus, comprising:
The determining module is used for responding to the received voice control instruction and determining a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object;
the matching module is used for determining a second service type of the application program in the current opening state under the condition that the first service types are multiple, and matching the second service type with the first service type to obtain a matching result;
and the processing module is used for providing the voice service corresponding to the semantic service object through the application program corresponding to the first target service type when the matching result is that the first target service type successfully matched with the first service type exists in the second service type.
9. An electronic device, the electronic device comprising:
A processor; a memory for storing the processor-executable instructions;
The processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the speech processing method of any of the preceding claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program for executing the speech processing method according to any one of the preceding claims 1-7.
CN202211550685.8A 2022-12-05 2022-12-05 Voice processing method, device, equipment and medium Pending CN118155613A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211550685.8A CN118155613A (en) 2022-12-05 2022-12-05 Voice processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211550685.8A CN118155613A (en) 2022-12-05 2022-12-05 Voice processing method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN118155613A true CN118155613A (en) 2024-06-07

Family

ID=91297622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211550685.8A Pending CN118155613A (en) 2022-12-05 2022-12-05 Voice processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN118155613A (en)

Similar Documents

Publication Publication Date Title
CN112164391B (en) Statement processing method, device, electronic equipment and storage medium
CN108985358B (en) Emotion recognition method, device, equipment and storage medium
CN112685565A (en) Text classification method based on multi-mode information fusion and related equipment thereof
CN111767740B (en) Sound effect adding method and device, storage medium and electronic equipment
CN110674349B (en) Video POI (Point of interest) identification method and device and electronic equipment
CN110765294B (en) Image searching method and device, terminal equipment and storage medium
CN112509562B (en) Method, apparatus, electronic device and medium for text post-processing
CN111428011B (en) Word recommendation method, device, equipment and storage medium
WO2023279843A1 (en) Content search method, apparatus and device, and storage medium
CN113486170B (en) Natural language processing method, device, equipment and medium based on man-machine interaction
CN113011169B (en) Method, device, equipment and medium for processing conference summary
CN111460288B (en) Method and device for detecting news event
CN112309384B (en) Voice recognition method, device, electronic equipment and medium
CN111444321B (en) Question answering method, device, electronic equipment and storage medium
CN113033707B (en) Video classification method and device, readable medium and electronic equipment
CN114357325A (en) Content search method, device, equipment and medium
CN112069786A (en) Text information processing method and device, electronic equipment and medium
CN112685996B (en) Text punctuation prediction method and device, readable medium and electronic equipment
CN112182179B (en) Entity question-answer processing method and device, electronic equipment and storage medium
CN118155613A (en) Voice processing method, device, equipment and medium
CN110543491A (en) Search method, search device, electronic equipment and computer-readable storage medium
CN111753080B (en) Method and device for outputting information
CN117932049B (en) Medical record abstract generation method, device, computer equipment and medium
US20240330769A1 (en) Object processing method, device, readable medium and electronic device
CN111046146B (en) Method and device for generating information

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination