CN118155613A - Voice processing method, device, equipment and medium - Google Patents

Voice processing method, device, equipment and medium Download PDF

Info

Publication number
CN118155613A
CN118155613A CN202211550685.8A CN202211550685A CN118155613A CN 118155613 A CN118155613 A CN 118155613A CN 202211550685 A CN202211550685 A CN 202211550685A CN 118155613 A CN118155613 A CN 118155613A
Authority
CN
China
Prior art keywords
semantic
service type
service
segmentation
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211550685.8A
Other languages
Chinese (zh)
Inventor
张斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Co Wheels Technology Co Ltd
Original Assignee
Beijing Co Wheels Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Co Wheels Technology Co Ltd filed Critical Beijing Co Wheels Technology Co Ltd
Priority to CN202211550685.8A priority Critical patent/CN118155613A/en
Publication of CN118155613A publication Critical patent/CN118155613A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本公开实施例涉及一种语音处理方法、装置、设备及介质,其中该方法包括:响应于接收到的语音控制指令,确定与语音控制指令对应的语义服务对象以及语义服务对象的第一服务类型;在第一服务类型为多个的情况下,确定当前处于开启状态的应用程序的第二服务类型,并将第二服务类型与第一服务类型匹配以获取匹配结果;在匹配结果为第二服务类型中存在与第一服务类型匹配成功的第一目标服务类型时,通过第一目标服务类型对应的应用程序提供与语义服务对象对应的语音服务。在本公开的实施例中,在存在多语义的情况下,可以根据开启的应用程序的服务类型在多语义中确定正确的语义,减少了与用户进行语义澄清的交互,降低了语义澄清的耗时,提升了语音处理效率。

The embodiments of the present disclosure relate to a speech processing method, apparatus, device and medium, wherein the method comprises: in response to a received voice control instruction, determining a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object; in the case where there are multiple first service types, determining a second service type of an application currently in an open state, and matching the second service type with the first service type to obtain a matching result; when the matching result is that there is a first target service type in the second service type that successfully matches the first service type, providing a speech service corresponding to the semantic service object through the application corresponding to the first target service type. In the embodiments of the present disclosure, in the case where there are multiple semantics, the correct semantics can be determined in the multiple semantics according to the service type of the open application, which reduces the interaction with the user for semantic clarification, reduces the time consumption of semantic clarification, and improves the speech processing efficiency.

Description

语音处理方法、装置、设备及介质Voice processing method, device, equipment and medium

技术领域Technical Field

本公开涉及人工智能技术领域,尤其涉及一种语音处理方法、装置、设备及介质。The present disclosure relates to the field of artificial intelligence technology, and in particular to a speech processing method, device, equipment and medium.

背景技术Background technique

人工智能(Artificial Intelligence,AI),是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器,该领域的研究包括机器人、语言识别等。随着人工智能的发展,基于人工智能进行语音识别处理,成为一种常见的交互方式。Artificial Intelligence (AI) is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Research in this field includes robotics, language recognition, etc. With the development of artificial intelligence, speech recognition processing based on artificial intelligence has become a common way of interaction.

相关技术中,对于用户的语音控制指令出现多语义时,多采用澄清技术,通过多次与用户交互,使得用户选择正确的语义,以完成语音处理,如当用户的语音控制指令为“播放匆匆那年”,则对应的语义服务对象可能是歌曲类型,也有可能是视频类型,因此,需要再次与用户交互以进行语义澄清,比如,询问用户“你想听匆匆那年的歌曲还是视频”等,当用户确认“想听歌曲”,则确定最后的语义识别结果为“播放匆匆那年歌曲”,而与用户进行交互以获取正确的语义过程的耗时,导致语音处理的效率较低。In the related art, when the user's voice control command has multiple semantics, clarification technology is often used. Through multiple interactions with the user, the user is allowed to select the correct semantics to complete the voice processing. For example, when the user's voice control command is "play Fleet of Time", the corresponding semantic service object may be a song type or a video type. Therefore, it is necessary to interact with the user again for semantic clarification. For example, the user is asked "Do you want to listen to the song or video of Fleet of Time?" When the user confirms "I want to listen to the song", the final semantic recognition result is determined to be "Play Fleet of Time song". The time-consuming process of interacting with the user to obtain the correct semantics leads to low efficiency of voice processing.

发明内容Summary of the invention

为了解决上述技术问题或者至少部分地解决上述技术问题,本公开提供了一种语音处理方法、装置、设备及介质,在语音控制指令对应的语音服务对象存在多个服务类型时,通过语音控制指令对应的语义服务对象与开启的应用程序的服务类型匹配,确定与语音控制指令对应的应用程序,以便于基于对应的应用程序提供语音服务,减少了与用户进行语义澄清的交互,降低了语义澄清的耗时,提升了语音处理效率。In order to solve the above technical problems or at least partially solve the above technical problems, the present disclosure provides a speech processing method, apparatus, device and medium. When there are multiple service types of the speech service object corresponding to the voice control instruction, the application corresponding to the voice control instruction is determined by matching the semantic service object corresponding to the voice control instruction with the service type of the opened application, so as to provide voice service based on the corresponding application, reduce the interaction with the user for semantic clarification, reduce the time consumption of semantic clarification, and improve the speech processing efficiency.

本公开实施例提供了一种语音处理方法,所述方法包括:响应于接收到的语音控制指令,确定与所述语音控制指令对应的语义服务对象以及所述语义服务对象的第一服务类型;在所述第一服务类型为多个的情况下,确定当前处于开启状态的应用程序的第二服务类型,并将所述第二服务类型与所述第一服务类型匹配;在所述第二服务类型中存在与所述第一服务类型匹配成功的第一目标服务类型的情况下,通过所述第一目标服务类型对应的应用程序提供与所述语义服务对象对应的语音服务。An embodiment of the present disclosure provides a voice processing method, the method comprising: in response to a received voice control instruction, determining a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object; in the case where there are multiple first service types, determining a second service type of an application that is currently turned on, and matching the second service type with the first service type; in the case where there is a first target service type in the second service type that successfully matches the first service type, providing a voice service corresponding to the semantic service object through the application corresponding to the first target service type.

本公开实施例还提供了一种语音处理装置,所述装置包括:确定模块,用于响应于接收到的语音控制指令,确定与所述语音控制指令对应的语义服务对象以及所述语义服务对象的第一服务类型;匹配模块,用于在所述第一服务类型为多个的情况下,确定当前处于开启状态的应用程序的第二服务类型,并将所述第二服务类型与所述第一服务类型匹配;处理模块,用于在所述第二服务类型中存在与所述第一服务类型匹配成功的第一目标服务类型的情况下,通过所述第一目标服务类型对应的应用程序提供与所述语义服务对象对应的语音服务。The present disclosure also provides a speech processing device, which includes: a determination module for determining, in response to a received voice control instruction, a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object; a matching module for determining, when there are multiple first service types, a second service type of an application currently in an open state, and matching the second service type with the first service type; and a processing module for providing, when there is a first target service type in the second service type that successfully matches the first service type, a speech service corresponding to the semantic service object through an application corresponding to the first target service type.

本公开实施例还提供了一种电子设备,所述电子设备包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现如本公开实施例提供的语音处理方法。An embodiment of the present disclosure also provides an electronic device, which includes: a processor; a memory for storing executable instructions of the processor; the processor is used to read the executable instructions from the memory and execute the instructions to implement the speech processing method provided in the embodiment of the present disclosure.

本公开实施例还提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行如本公开实施例提供的语音处理方法。The embodiment of the present disclosure further provides a computer-readable storage medium, wherein the storage medium stores a computer program, and the computer program is used to execute the speech processing method provided by the embodiment of the present disclosure.

本公开实施例提供的技术方案与现有技术相比具有如下优点:Compared with the prior art, the technical solution provided by the embodiments of the present disclosure has the following advantages:

本公开实施例提供的语音处理方案,响应于接收到的语音控制指令,确定与语音控制指令对应的语义服务对象以及语义服务对象的第一服务类型,在第一服务类型为多个的情况下,确定当前处于开启状态的应用程序的第二服务类型,并将第二服务类型与第一服务类型匹配,进而,在第二服务类型中存在与第一服务类型匹配成功的第一目标服务类型的情况下,通过第一目标服务类型对应的应用程序提供与语义服务对象对应的语音服务。在本公开的实施例中,在语音控制指令对应的语音服务对象存在多个服务类型时,通过语音控制指令对应的语义服务对象与开启的应用程序的服务类型匹配,确定与语音控制指令对应的应用程序,以便于基于对应的应用程序提供语音服务,从而,减少了与用户进行语义澄清的交互,降低了语义澄清的耗时,提升了语音处理效率。The speech processing scheme provided by the embodiment of the present disclosure, in response to a received voice control instruction, determines the semantic service object corresponding to the voice control instruction and the first service type of the semantic service object, and in the case where there are multiple first service types, determines the second service type of the application currently in the open state, and matches the second service type with the first service type, and then, in the case where there is a first target service type in the second service type that successfully matches the first service type, provides the speech service corresponding to the semantic service object through the application corresponding to the first target service type. In the embodiment of the present disclosure, when there are multiple service types of the speech service object corresponding to the voice control instruction, the application corresponding to the voice control instruction is determined by matching the semantic service object corresponding to the voice control instruction with the service type of the open application, so as to provide the speech service based on the corresponding application, thereby reducing the interaction with the user for semantic clarification, reducing the time consumption of semantic clarification, and improving the speech processing efficiency.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of the embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the accompanying drawings, the same or similar reference numerals represent the same or similar elements. It should be understood that the drawings are schematic and the originals and elements are not necessarily drawn to scale.

图1为本公开实施例提供的一种语音处理方法的流程示意图;FIG1 is a flow chart of a speech processing method provided by an embodiment of the present disclosure;

图2为本公开实施例提供的另一种语音处理方法的流程示意图;FIG2 is a flow chart of another speech processing method provided by an embodiment of the present disclosure;

图3为本公开实施例提供的另一种语音处理方法的流程示意图;FIG3 is a flow chart of another speech processing method provided by an embodiment of the present disclosure;

图4为本公开实施例提供的一种语音处理场景示意图;FIG4 is a schematic diagram of a speech processing scenario provided by an embodiment of the present disclosure;

图5为本公开实施例提供的另一种语音处理场景示意图;FIG5 is a schematic diagram of another speech processing scenario provided by an embodiment of the present disclosure;

图6为本公开实施例提供的另一种语音处理场景示意图;FIG6 is a schematic diagram of another speech processing scenario provided by an embodiment of the present disclosure;

图7为本公开实施例提供的一种语音处理装置的结构示意图;FIG7 is a schematic diagram of the structure of a speech processing device provided by an embodiment of the present disclosure;

图8为本公开实施例提供的一种电子设备的结构示意图。FIG8 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the accompanying drawings, it should be understood that the present disclosure can be implemented in various forms and should not be construed as being limited to the embodiments described herein, which are instead provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes and are not intended to limit the scope of protection of the present disclosure.

应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. In addition, the method embodiments may include additional steps and/or omit the steps shown. The scope of the present disclosure is not limited in this respect.

本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。The term "including" and its variations used herein are open inclusions, i.e., "including but not limited to". The term "based on" means "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". The relevant definitions of other terms will be given in the following description.

需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that the concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order or interdependence of the functions performed by these devices, modules or units.

需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that the modifications of "one" and "plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless otherwise clearly indicated in the context, it should be understood as "one or more".

本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of the messages or information exchanged between multiple devices in the embodiments of the present disclosure are only used for illustrative purposes and are not used to limit the scope of these messages or information.

为了解决上述问题,本公开实施例提供了一种语音处理方法,在该方法中,当有应用程序开启时,根据语音控制指令对应的语义服务对象与开启的应用程序的服务类型匹配,确定与语音控制指令对应的应用程序,以便于基于对应的应用程序提供语音服务,减少了与用户进行语义澄清的交互,提升了语音处理效率。下面结合具体的实施例对该方法进行介绍。In order to solve the above problems, the embodiments of the present disclosure provide a voice processing method, in which, when an application is opened, the application corresponding to the voice control instruction is determined according to the semantic service object corresponding to the voice control instruction and the service type of the opened application, so as to provide voice service based on the corresponding application, reduce the interaction of semantic clarification with the user, and improve the efficiency of voice processing. The method is introduced below in conjunction with a specific embodiment.

图1为本公开实施例提供的一种语音处理方法的流程示意图,该方法可以由语音处理装置执行,其中该装置可以采用软件和/或硬件实现,一般可集成在电子设备中,该电子设备可以是集成于车辆等设备上等,从而,本实施例中的语音处理方法也可以在车辆提供语音处理服务。如图1所示,该方法包括:FIG1 is a flow chart of a voice processing method provided by an embodiment of the present disclosure. The method can be executed by a voice processing device, wherein the device can be implemented by software and/or hardware, and can generally be integrated in an electronic device, and the electronic device can be integrated in a vehicle or other equipment, so that the voice processing method in this embodiment can also provide voice processing services in a vehicle. As shown in FIG1 , the method includes:

步骤101,响应于接收到的语音控制指令,确定与语音控制指令对应的语义服务对象以及语义服务对象的第一服务类型。Step 101: In response to a received voice control instruction, determine a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object.

其中,语义服务对象可以理解为语音控制指令对应的控制对象,比如,当语音控制指令为“播放匆匆那年”,则对应的语义服务对象为“匆匆那年”等。Among them, the semantic service object can be understood as the control object corresponding to the voice control instruction. For example, when the voice control instruction is "play Fleet of Time", the corresponding semantic service object is "Fleet of Time" and so on.

需要说明的是,在一些可能的实施例中,可以将语音控制指令包含的语音信息通过语音识别技术转换为文字信息,通过对文字信息中每个词的词性进行识别,根据识别结果确定语义服务对象,比如,确定词性为动词之后的名词作为语音服务对象等;It should be noted that, in some possible embodiments, the voice information contained in the voice control instruction can be converted into text information through voice recognition technology, and the part of speech of each word in the text information is recognized, and the semantic service object is determined according to the recognition result, for example, the noun after the part of speech of the verb is determined as the voice service object, etc.

在一些可能的实施例中,也可以根据大量样本数据训练深度学习模型,将该语音控制指令包含的语音信息通过语音识别技术转换为文字信息,将文字信息输入预先训练的深度学习模型,以获取该深度学习模型输出的语义服务对象等。In some possible embodiments, a deep learning model can also be trained based on a large amount of sample data, the voice information contained in the voice control instruction can be converted into text information through voice recognition technology, and the text information can be input into a pre-trained deep learning model to obtain the semantic service object output by the deep learning model.

进一步地,确定语音服务对象的第一服务类型,其中,该第一服务类型用于指示予以语义服务类型可能的信息类型,比如,当语义服务对象为上述“匆匆那年”时,则由于“匆匆那年”对应的信息类型有可能是“歌曲”,也有可能是“视频”,因此,“匆匆那年”对应的第一服务类型则包括“歌曲”和“视频”。Further, the first service type of the voice service object is determined, wherein the first service type is used to indicate the possible information type given to the semantic service type. For example, when the semantic service object is the above-mentioned "Fleeting Years", since the information type corresponding to "Fleeting Years" may be "song" or "video", the first service type corresponding to "Fleeting Years" includes "song" and "video".

需要说明的是,在一些可能的实施例中,对语音控制指令进行语义识别,以确定与语音控制指令对象的语义服务对象,其中,语义识别的方式可参照上述实施例,进一步的,向预设服务器发送携带语义服务对象的查询请求,并获取预设服务器反馈的语义服务对象的第一服务类型,预设服务器可以根据语义服务对象确定线上数据中或者是预先存储的线下数据中,与语音服务对象对应的第一服务类型等。It should be noted that, in some possible embodiments, semantic recognition is performed on the voice control instruction to determine the semantic service object corresponding to the voice control instruction object, wherein the method of semantic recognition can refer to the above-mentioned embodiment. Further, a query request carrying the semantic service object is sent to the preset server, and the first service type of the semantic service object fed back by the preset server is obtained. The preset server can determine the first service type corresponding to the voice service object in the online data or the pre-stored offline data based on the semantic service object.

在一些可能的实施例中,还可以在预设检索平台检索语义服务对象对应的检索结果,根据检索结果确定每个候选服务类型的次数,确定次数属于前预设个数的候选服务类型为第一服务类型,其中,在实际执行过程中,还可以进一步确定次数属于前预设个数的候选服务类型的次数,占总检索结果的比值,当该比值小于预设比值阈值时,则不将对象的候选服务类型作为第一服务类型,当所有次数属于前预设个数的候选服务类型对应的比值均不大于预设比值阈值时,则将次数最高的一个候选服务对象作为第一服务类型,从而,进一步提升了第一服务类型的准确性。In some possible embodiments, the search results corresponding to the semantic service object can also be retrieved on a preset search platform, and the number of each candidate service type is determined according to the search results, and the candidate service type whose number belongs to the preset number is determined as the first service type. In the actual execution process, the ratio of the number of candidate service types whose number belongs to the preset number to the total search results can be further determined. When the ratio is less than a preset ratio threshold, the candidate service type of the object is not taken as the first service type. When the ratios corresponding to all candidate service types whose number belongs to the preset number are not greater than the preset ratio threshold, the candidate service object with the highest number is taken as the first service type, thereby further improving the accuracy of the first service type.

步骤102,在第一服务类型为多个的情况下,确定当前处于开启状态的应用程序的第二服务类型,并将第二服务类型与所述第一服务类型匹配以获取匹配结果。Step 102, when there are multiple first service types, determine the second service type of the application currently in the open state, and match the second service type with the first service type to obtain a matching result.

在本公开的一个实施例中,在第一服务类型为多个的情况下,并不直接进行语义澄清,而是确定当前处于开启状态的应用程序的第二服务类型,比如,确定当前车辆中处于运行状态的车载应用程序的第二服务类型,其中,第二服务类型指示应用程序的应用服务类型,比如,对于歌曲播放应用程序而言,其对应的第二服务类型为“歌曲”,比如,对于视频播放应用程序而言,其对象的第二服务类型为“视频”等。In one embodiment of the present disclosure, when there are multiple first service types, semantic clarification is not performed directly, but the second service type of the application that is currently in the open state is determined, for example, the second service type of the in-vehicle application that is currently running in the vehicle is determined, wherein the second service type indicates the application service type of the application, for example, for a song playing application, its corresponding second service type is "song", for example, for a video playing application, its object's second service type is "video", and so on.

其中,在一些可能的实施例中,可以获取当前处于开启状态的应用程序的程序名称,查询预先构建的对应关系获取与该程序名称对应的第二服务类型。In some possible embodiments, the program name of the application program currently in the open state may be obtained, and the pre-built corresponding relationship may be queried to obtain the second service type corresponding to the program name.

在本实施例中,当确定第二服务类型后,进一步将服务类型与第一服务类型匹配,以便于确定当前是否有可以直接提供与语音控制指令对应的服务的应用程序。In this embodiment, after the second service type is determined, the service type is further matched with the first service type to determine whether there is currently an application that can directly provide a service corresponding to the voice control instruction.

步骤103,当匹配结果为第二服务类型中存在与第一服务类型匹配成功的第一目标服务类型时,通过第一目标服务类型对应的应用程序提供与语义服务对象对应的语音服务。Step 103: When the matching result is that there is a first target service type in the second service type that successfully matches the first service type, a voice service corresponding to the semantic service object is provided through an application corresponding to the first target service type.

在本公开的一个实施例中,当匹配结果为第二服务类型中存在与第一服务类型匹配成功的第一目标服务类型时,通过第一目标服务类型对应的应用程序提供与语义服务对象对应的语音服务,也就是说,在当前开启的应用程序中,存在提供的第二服务类型与语义服务对象的第一服务类型匹配的第一目标服务类型,此时直接通过对应的应用程序提供语音服务,不但直接在已经开启的应用程序中确定语音服务的应用程序,而且可以直接通过第一目标应用程序提供语音处理,此时无需进行语义澄清,减少了语义澄清的交互次数,提升了语音处理的效率。In one embodiment of the present disclosure, when the matching result is that there is a first target service type in the second service type that successfully matches the first service type, a voice service corresponding to the semantic service object is provided through an application corresponding to the first target service type. That is, in the currently opened application, there is a first target service type that provides a second service type that matches the first service type of the semantic service object. At this time, the voice service is provided directly through the corresponding application. Not only can the application for the voice service be directly determined in the already opened applications, but voice processing can also be provided directly through the first target application. At this time, there is no need for semantic clarification, which reduces the number of interactions for semantic clarification and improves the efficiency of voice processing.

其中,在本公开实施例提到的通过第一目标服务类型对应的应用程序提供与语义服务对象对应的语音服务,为通过第一目标服务类型对应的应用程序,确定与语义服务对象对应的多媒体资源,根据第一目标服务类型对应的应用程序播放多媒体资源。比如,当语义服务对象为“匆匆那年”,对应的多个第一服务类型为“歌曲”和“视频”,则打开的应用程序的第二服务类型为“视频”、“地图”,则显然由于第二服务类型“视频”和第一服务类型“视频”一致,因此,可直接通过“视频”应用提供“匆匆那年”的播放服务。Among them, the voice service corresponding to the semantic service object is provided through the application corresponding to the first target service type mentioned in the embodiment of the present disclosure, which is to determine the multimedia resources corresponding to the semantic service object through the application corresponding to the first target service type, and play the multimedia resources according to the application corresponding to the first target service type. For example, when the semantic service object is "Fleeting Time", the corresponding multiple first service types are "songs" and "videos", and the second service type of the opened application is "video" and "map", then it is obvious that since the second service type "video" is consistent with the first service type "video", the playback service of "Fleeting Time" can be provided directly through the "video" application.

当然,在实际执行过程中,也有可能存在多个第一目标服务类型,在这种场情况下,可以确定每个第一目标服务类型对应的应用程序的开启时间,则可以确定开启时间最近的第一目标服务类型对应的应用程序为提供语音服务的应用程序;或者,可以获取用户的喜好信息,根据用户的喜好信息在多个第一目标服务类型中确定最符合用户喜好信息的第一目标服务类型对应的应用程序为提供语音服务的应用程序。Of course, in the actual execution process, there may be multiple first target service types. In this case, the start time of the application corresponding to each first target service type can be determined, and the application corresponding to the first target service type with the most recent start time can be determined as the application providing voice service; or, the user's preference information can be obtained, and based on the user's preference information, the application corresponding to the first target service type that best meets the user's preference information among multiple first target service types can be determined as the application providing voice service.

综上,本公开实施例的语音处理方法,响应于接收到的语音控制指令,确定与语音控制指令对应的语义服务对象以及语义服务对象的第一服务类型,在第一服务类型为多个的情况下,确定当前处于开启状态的应用程序的第二服务类型,并将第二服务类型与第一服务类型匹配,进而,在第二服务类型中存在与第一服务类型匹配成功的第一目标服务类型的情况下,通过第一目标服务类型对应的应用程序提供与语义服务对象对应的语音服务。在本公开的实施例中,在语音控制指令对应的语音服务对象存在多个服务类型时,通过语音控制指令对应的语义服务对象与开启的应用程序的服务类型匹配,确定与语音控制指令对应的应用程序,以便于基于对应的应用程序提供语音服务,即在存在多语义的情况下,可以根据开启的应用程序的服务类型在多语义中确定正确的语义,减少了与用户进行语义澄清的交互,降低了语义澄清的耗时,提升了语音处理效率。In summary, the speech processing method of the embodiment of the present disclosure, in response to the received voice control instruction, determines the semantic service object corresponding to the voice control instruction and the first service type of the semantic service object, and in the case where there are multiple first service types, determines the second service type of the application currently in the open state, and matches the second service type with the first service type, and then, in the case where there is a first target service type in the second service type that successfully matches the first service type, the speech service corresponding to the semantic service object is provided by the application corresponding to the first target service type. In the embodiment of the present disclosure, when there are multiple service types of the speech service object corresponding to the voice control instruction, the application corresponding to the voice control instruction is determined by matching the semantic service object corresponding to the voice control instruction with the service type of the open application, so as to provide a speech service based on the corresponding application, that is, in the case of multiple semantics, the correct semantics can be determined in the multiple semantics according to the service type of the open application, which reduces the interaction with the user for semantic clarification, reduces the time consumption of semantic clarification, and improves the speech processing efficiency.

基于上述实施例,在将第二服务类型与第一服务类型匹配之后,还可能存在其他匹配结果,在本公开的实施例中,针对其他匹配结果做进一步的语义澄清处理。Based on the above embodiment, after the second service type is matched with the first service type, there may be other matching results. In the embodiment of the present disclosure, further semantic clarification processing is performed on the other matching results.

在本公开的一个实施例中,如图2所示,在将第二服务类型与第一服务类型匹配之后,还包括:In one embodiment of the present disclosure, as shown in FIG2 , after matching the second service type with the first service type, the method further includes:

步骤201,当匹配结果为第二服务类型中不存在与第一服务类型匹配成功的第一目标服务类型时,根据多个第一服务类型生成语义澄清提示信息。Step 201: When the matching result is that there is no first target service type in the second service type that successfully matches the first service type, semantic clarification prompt information is generated according to multiple first service types.

在本公开的实施例中,当匹配结果为第二服务类型中不存在与第一服务类型匹配成功的第一目标服务类型时,根据第一服务类型生成语义澄清提示消息,其中,语义澄清提示消息通常包含语义服务对象以及语义服务对象对应的多个第一服务类型。In an embodiment of the present disclosure, when the matching result is that there is no first target service type in the second service type that successfully matches the first service type, a semantic clarification prompt message is generated based on the first service type, wherein the semantic clarification prompt message generally includes a semantic service object and multiple first service types corresponding to the semantic service object.

比如,继续以语义服务对象为“匆匆那年”为例进行说明,当第二服务类型中不存在与第一服务类型匹配成功的第一目标服务类型的情况下,根据多个第一服务类型“歌曲”和“视频”生成的语音澄清提示信息可以为“您是想听匆匆那年歌曲,还是想看匆匆那年的视频”。For instance, continuing to use the example of the semantic service object "Fleeting Time", when there is no first target service type in the second service type that successfully matches the first service type, the voice clarification prompt message generated based on multiple first service types "songs" and "videos" can be "Do you want to listen to the song "Fleeting Time" or watch the video "Fleeting Time".

步骤202,根据语义澄清提示信息进行语义澄清提示处理。Step 202: Perform semantic clarification prompt processing according to the semantic clarification prompt information.

需要说明的是,在不同的应用场景中,根据语义澄清提示信息进行语义澄清提示处理的方式不同,比如,可以采用语义播放语义澄清提示信息语义澄清提示信息;比如,可以检测当前用户是否在使用屏幕,当用户正在使用屏幕,则可以采用弹窗文字显示语义澄清提示信息的方式进行语义澄清提示处理等。It should be noted that in different application scenarios, the methods of processing semantic clarification prompts are different according to the semantic clarification prompt information. For example, semantic playback of semantic clarification prompt information can be used; for example, it can be detected whether the current user is using the screen. When the user is using the screen, the semantic clarification prompt information can be displayed in pop-up text to perform semantic clarification prompt processing, etc.

进一步地,响应于在预设时长内接收到根据语义澄清提示信息输入的第二目标服务类型,确定与第二目标服务类型对应的应用程序,其中,输入第二目标服务类型的方式可以为语音输入,也可以是通过触发对应弹窗文字中包含的第一服务类型的标签等输入。Furthermore, in response to receiving a second target service type input according to the semantic clarification prompt information within a preset time period, an application corresponding to the second target service type is determined, wherein the second target service type may be input by voice input or by triggering an input such as a label of the first service type contained in the corresponding pop-up text.

进而,为了避免打开的第二目标服务类型对应的应用程序不会影响行车安全,在本公开的实施例中,还识别车辆的当前驾驶参数信息,其中,驾驶参数信息包括驾驶速度信息、驾驶道路类型信息等,根据当前驾驶参数信息确定当前驾驶安全等级,比如,将当前驾驶参数信息输入预先训练的深度学习模型,以获取该深度学习模型得到的当前驾驶安全等级,进而,可通过查询预设对应关系等获取第二目标服务类型对应的应用程序的程序安全等级,在应用程序安全等级与当前驾驶安全等级匹配时,比如,在程序安全等级大于等于当前驾驶安全等级时,确定当前运行第二目标服务类型对应的应用程序不会对行车安全带来安全隐患,在这种情况下,通过第二目标服务类型对应的应用程序提供与语义服务对象对应的语音服务,即开启第二目标服务类型对应的应用程序,通过第二目标服务类型对应的应用程序获取与语义服务对象对应的多媒体资源,开启第二目标服务类型对应的应用程序播放该多媒体资源。由此,在本实施例中,只有在第二服务类型中不存在与第一服务类型匹配成功的第一目标服务类型的情况下,才进行语义澄清处理,而在第二服务类型中存在第一服务类型匹配的第一目标服务类型的情况下,不进行语义澄清处理,降低了语义澄清时的交互次数。Furthermore, in order to prevent the opened application corresponding to the second target service type from affecting driving safety, in an embodiment of the present disclosure, the current driving parameter information of the vehicle is also identified, wherein the driving parameter information includes driving speed information, driving road type information, etc., and the current driving safety level is determined according to the current driving parameter information. For example, the current driving parameter information is input into a pre-trained deep learning model to obtain the current driving safety level obtained by the deep learning model. Furthermore, the program security level of the application corresponding to the second target service type can be obtained by querying a preset corresponding relationship, etc. When the application security level matches the current driving safety level, for example, when the program security level is greater than or equal to the current driving safety level, it is determined that the currently running application corresponding to the second target service type will not bring safety hazards to driving safety. In this case, the voice service corresponding to the semantic service object is provided through the application corresponding to the second target service type, that is, the application corresponding to the second target service type is opened, the multimedia resource corresponding to the semantic service object is obtained through the application corresponding to the second target service type, and the application corresponding to the second target service type is opened to play the multimedia resource. Therefore, in this embodiment, semantic clarification processing is performed only when there is no first target service type in the second service type that successfully matches the first service type, and when there is a first target service type that matches the first service type in the second service type, semantic clarification processing is not performed, thereby reducing the number of interactions during semantic clarification.

在本公开的一个实施例中,也有可能在预设时长内没有接收到根据语义澄清提示信息输入的第二目标服务类型,则为了提升语音处理的智能性,还可以确定每个第一服务类型的热度信息,其中,热度信息可以是根据用户对语义服务对象的每个第一服务类型对应的历史搜索次数确定的,其中,历史搜索次数和热度信息成正比关系,或者,也可以确定与语义服务对象的每个第一服务类型对应的多媒体资源的播放次数,根据播放系数确定每个第一服务类型的热度信息,其中,热度信息与播放次数成正比关系。In one embodiment of the present disclosure, it is possible that the second target service type input according to the semantic clarification prompt information is not received within a preset time period. In order to improve the intelligence of voice processing, the popularity information of each first service type can also be determined, wherein the popularity information can be determined based on the number of historical searches corresponding to each first service type of the semantic service object by the user, wherein the number of historical searches is proportional to the popularity information. Alternatively, the number of playbacks of the multimedia resources corresponding to each first service type of the semantic service object can also be determined, and the popularity information of each first service type can be determined based on the playback coefficient, wherein the popularity information is proportional to the number of playbacks.

进一步地,根据热度信息在多个第一服务类型中确定第三目标服务类型,并确定第三目标服务类型对应的应用程序,其中,第三目标服务类型通常为热度信息最高的服务类型,进而,开启第三目标服务类型对应的应用程序,以通过第三目标服务类型对应的应用程序提供与语义服务对象对应的语音服务。Furthermore, a third target service type is determined among multiple first service types based on the popularity information, and an application corresponding to the third target service type is determined, wherein the third target service type is usually the service type with the highest popularity information. Then, the application corresponding to the third target service type is started to provide voice service corresponding to the semantic service object through the application corresponding to the third target service type.

在本公开一个实施例中,在第一服务类型为单个的情况下,确定与第一服务类型对应的第四目标服务类型,并确定第四目标服务类型对应的应用程序,通过第四目标服务类型对应的应用程序提供与语义服务对象对应的语音服务,其中,第四目标服务类型可能是当前已经开启的应用程序,可能是没有打开的应用程序,在此不作限制,即在第一服务类型只有一个的情况下,直接打开对应的应用程序提供语音服务,而无需进行已开启的应用程序的服务类型与第一服务类型的匹配,进一步提升了语音处理效率。In one embodiment of the present disclosure, when the first service type is single, a fourth target service type corresponding to the first service type is determined, and an application corresponding to the fourth target service type is determined, and a voice service corresponding to the semantic service object is provided through the application corresponding to the fourth target service type, wherein the fourth target service type may be an application that is currently opened, or may be an application that is not opened, and no limitation is made here, that is, when there is only one first service type, the corresponding application is directly opened to provide voice service without matching the service type of the opened application with the first service type, thereby further improving the voice processing efficiency.

综上,本公开实施例的语音处理方法,在第二服务类型中不存在与第一服务类型匹配成功的第一目标服务类型的情况下,根据多个第一服务类型生成语义澄清提示信息,根据语义澄清提示信息进行语义澄清提示处理。兼顾了语音处理效率以及语音处理的可靠性。In summary, the speech processing method of the embodiment of the present disclosure, when there is no first target service type in the second service type that successfully matches the first service type, generates semantic clarification prompt information according to multiple first service types, and performs semantic clarification prompt processing according to the semantic clarification prompt information, taking into account both speech processing efficiency and speech processing reliability.

基于上述实施例,容易理解的是,在确定与语音控制指令对应的语义服务对象以及第一服务类型时,除了可以考虑语义控制指令对应的语义识别结果本身的语义信息之外,还可以进挖掘语义识别结果中语义分词之间的关系,基于语义分词之间的关系来确定语音控制指令对应的语义服务对象以及第一服务类型,由此,提升语音控制指令对应的语义服务对象以及第一服务类型的识别精度。Based on the above embodiments, it is easy to understand that when determining the semantic service object and the first service type corresponding to the voice control instruction, in addition to considering the semantic information of the semantic recognition result corresponding to the semantic control instruction itself, the relationship between the semantic word segmentations in the semantic recognition result can also be further explored, and the semantic service object and the first service type corresponding to the voice control instruction are determined based on the relationship between the semantic word segmentations, thereby improving the recognition accuracy of the semantic service object and the first service type corresponding to the voice control instruction.

在本公开的一个实施例中,如图3所示,确定与所述语音控制指令对应的语义服务对象以及所述语义服务对象的第一服务类型,包括:In one embodiment of the present disclosure, as shown in FIG3 , determining a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object includes:

步骤301,根据语音控制指令进行语义识别以得到目标语义识别结果。Step 301, performing semantic recognition according to the voice control instruction to obtain a target semantic recognition result.

在本公开的一个实施例中,根据语音控制指令进行语义识别以得到目标语义识别结果,该语义识别结果的样式为语义识别文本。In one embodiment of the present disclosure, semantic recognition is performed according to the voice control instruction to obtain a target semantic recognition result, and the format of the semantic recognition result is a semantic recognition text.

步骤302,识别目标语义识别结果中的包含的多个语义分词。Step 302: Identify multiple semantic segmentations included in the target semantic recognition result.

在本实施例中,可以根据分词词性识别等方式来识别目标语义识别结果中的包含的多个语义分词,比如,当目标语义识别结果为“我,今天,想,播放,匆匆那年”。In this embodiment, multiple semantic segmentations included in the target semantic recognition result can be identified based on segmentation part-of-speech recognition and other methods. For example, when the target semantic recognition result is "I, today, think, play, those years passed in a hurry".

步骤303,根据多个语义分词构建分词标签单元集合,其中,分词标签单元集合为包含行分词和列分词组成的单元集合,其中,行分词和列分词,按照多个语义分词在目标语义识别结果中的分词顺序进行相同设置得到。Step 303, construct a word segmentation label unit set based on multiple semantic word segmentations, wherein the word segmentation label unit set is a unit set consisting of row word segmentations and column word segmentations, wherein the row word segmentations and column word segmentations are set in the same manner according to the word segmentation order of the multiple semantic word segmentations in the target semantic recognition result.

在本公开的实施例中,为了挖掘语义分词之间的关系,根据多个语义分词构建分词标签单元集合,其中,分词标签单元集合为包含行分词和列分词组成的单元集合,即若是多个语音分词对应于n个分词,则可以构建n个分词组成的正方形的表格,该表格包括了n*n个单元格,n*n个单元格作为子单元组成了对应的分词标签单元集合,其中,行分词和列分词,按照多个语义分词在目标语义识别结果中的分词顺序进行相同设置得到,这里的相同设置是要么均按照在对应目标语义识别结果中从前往后的顺序设置,要么是均按照在对应目标语义识别结果中从后往前的顺序设置,要么是均按照在对应样本句子中散乱的顺序设置,即需要保证行和列设置的对应字符的顺序相同,保证同样编号的行和列对应的字符是对应目标语义识别结果中相同的语义分词,比如,如果多个语义分词为“我,今天,想,播放,匆匆那年”,则如图4所示,构建一个5*5的表格,表格的行和列均对应于为“我,今天,想,播放,匆匆那年”,5*5的表格对应的25个表格单元组成了分词标签单元集合。In an embodiment of the present disclosure, in order to mine the relationship between semantic participles, a participle label unit set is constructed according to multiple semantic participles, wherein the participle label unit set is a unit set consisting of row participles and column participles, that is, if multiple phonetic participles correspond to n participles, a square table consisting of n participles can be constructed, the table includes n*n cells, and the n*n cells serve as sub-units to form a corresponding participle label unit set, wherein the row participles and the column participles are obtained by setting the same order of the participles in the target semantic recognition result of the multiple semantic participles. The same setting here means that they are all set according to the order of the participles in the corresponding target semantic recognition result. The forward and backward order setting is either set according to the order from back to front in the corresponding target semantic recognition result, or set according to the scattered order in the corresponding sample sentence, that is, it is necessary to ensure that the order of the corresponding characters in the row and column settings is the same, and ensure that the characters corresponding to the rows and columns with the same numbers are the same semantic word segmentations in the corresponding target semantic recognition result. For example, if multiple semantic word segmentations are "I, today, want, play, those years in a hurry", then as shown in Figure 4, a 5*5 table is constructed, and the rows and columns of the table correspond to "I, today, want, play, those years in a hurry", and the 25 table cells corresponding to the 5*5 table constitute a word segmentation label unit set.

步骤304,对分词标签单元集合中由行分词和列分词组成的各子单元标注对应的信息类别标签。Step 304: label each sub-unit consisting of row segmentation and column segmentation in the segmentation label unit set with a corresponding information category label.

其中,对分词标签单元集合中由行分词和列分词组成的各子单元标注对应的信息类别标签,从而,每个分词标签单元集合中包含了两个维度的信息,一个维度是语义分词的信息类别(表格中的语义分词是所有的语义分词,无需语义分词的前期抽取,语义分词是不是具有对应的信息类别,在表格中得到体现),另一个维度是语义分词之间的信息类别。Among them, each sub-unit composed of row segmentation and column segmentation in the segmentation label unit set is marked with a corresponding information category label. Therefore, each segmentation label unit set contains two dimensions of information. One dimension is the information category of the semantic segmentation (the semantic segmentation in the table is all the semantic segmentation, and there is no need for preliminary extraction of semantic segmentation. Whether the semantic segmentation has a corresponding information category is reflected in the table), and the other dimension is the information category between the semantic segmentations.

在一些可能的实施例中,可确定分词标签单元集合中每个子单元对应的行分词的第一分词属性以及列分词的第二分词属性,将第一分词属性和第二分词属性输入预设的分词关系抽取模型,以获取对应的子单元的信息类别标签,其中,该分词关系抽取模型预先训练对一些有语义执行意义(与服务对象和服务类型有关)的分词属性的关系进行关系抽取,比如,在一些场景下,对于一些没有语义执行意义的分词属性不进行分词关系抽取,比如,对于“想”和“想”这种分词属性之间的分词关系而言输出的结果为“无信息类别”。In some possible embodiments, the first segmentation attribute of the row segmentation and the second segmentation attribute of the column segmentation corresponding to each sub-unit in the segmentation label unit set can be determined, and the first segmentation attribute and the second segmentation attribute are input into a preset segmentation relationship extraction model to obtain the information category label of the corresponding sub-unit, wherein the segmentation relationship extraction model is pre-trained to extract the relationship between some segmentation attributes with semantic execution meaning (related to service objects and service types). For example, in some scenarios, segmentation relationship extraction is not performed for some segmentation attributes that have no semantic execution meaning. For example, for the segmentation relationship between segmentation attributes such as "想" and "想", the output result is "no information category".

举例而言,继续以如图4所示的场景为例,如图5所示,在对应行和列的字符对应的单元格中标注对应的信息类别标签,该表格中对没有对应信息类别的单元格标注“无信息类别”对应的信息类别标签(本公开实施例中,可以┴表示“无信息类别”,“无信息类别”标识对应行语义分词和列语义分词对识别语义服务对象以及语义服务类型的确定关系较小),不同单元格标注的信息类别标签可以相同,可以不同,其中,比如,“我”和“我”对应的单元格标注的信息类别为“无信息类别”,“今天和今天”对应的单元格标注的信息类别为“无信息类别”,“我”和“播放”对应的单元格标注的信息类别为“主语-动词”,“我”和“匆匆那年”对应的信息类别为“主语-名词”等。For example, continuing with the scenario shown in Figure 4, as shown in Figure 5, the corresponding information category labels are marked in the cells corresponding to the characters in the corresponding rows and columns, and the cells in the table that do not have corresponding information categories are marked with the information category label corresponding to "no information category" (in the disclosed embodiment, ┴ can represent "no information category", and the "no information category" identifier corresponding to the row semantic segmentation and the column semantic segmentation has a small relationship in identifying the semantic service object and the semantic service type). The information category labels marked in different cells can be the same or different. For example, the information category marked in the cells corresponding to "I" and "me" is "no information category", the information category marked in the cells corresponding to "today and today" is "no information category", the information category marked in the cells corresponding to "I" and "play" is "subject-verb", the information category corresponding to "I" and "those years" is "subject-noun", etc.

由此,除了对有具体信息类别的单元格对应的语义分词标注之外,对无信息类别的单元格对应的语义分词也可以得到体现,因此,无需提前抽取对应的具有信息类别的语义分词,且基于标签表的信息类别标签的标注,遍历了每个语义分词以及相关句子中所有语义分词,从而,保证了信息类别抽取的准确性。Therefore, in addition to the semantic participle marking corresponding to cells with specific information categories, the semantic participles corresponding to cells without information categories can also be reflected. Therefore, there is no need to extract the corresponding semantic participles with information categories in advance. Based on the marking of the information category labels in the label table, each semantic participle and all semantic participles in related sentences are traversed, thereby ensuring the accuracy of information category extraction.

步骤305,根据分词标签单元集合中的信息类别标签确定语义服务对象和语义服务对象的第一服务类型。Step 305: Determine the semantic service object and the first service type of the semantic service object according to the information category label in the word segmentation label unit set.

在本公开的实施例中,在确定分词标签单元集合时,根据分词标签单元集合中的信息类别标签确定语义服务对象和语义服务对象的第一服务类型。In an embodiment of the present disclosure, when determining the word segmentation label unit set, the semantic service object and the first service type of the semantic service object are determined according to the information category label in the word segmentation label unit set.

需要说明的是,由于上述分词标签单元集合中的信息标签类别体现了语义分词之间的关系,因此结合语义分词之间的关系可以快速获知一些对语义命令执行有意义的分词,比如,可以确定“名词-名词”等信息类别对应的语义分词确定对应的语义服务对象,可以根据、“名词-动词”、“名词-名词”等信息类别对应的语义分词确定对应的第一服务类型等。It should be noted that, since the information label categories in the above-mentioned segmentation label unit set reflect the relationship between semantic segmentations, some segmentations that are meaningful to the execution of semantic commands can be quickly obtained by combining the relationship between semantic segmentations. For example, the semantic segmentations corresponding to information categories such as "noun-noun" can be determined to determine the corresponding semantic service objects, and the corresponding first service type can be determined according to the semantic segmentations corresponding to information categories such as "noun-verb" and "noun-noun".

在一些可能的实施例中,可以理解的是,由于分词标签单元集合对应的表格构建具有一些表格属性信息,比如,行分词和列分词组成的相同时,组成的分词通常在表格的对角线上,比如,与语义服务对象有关的信息类别通常沿着表格对角线对称分布,因此,结合这种表格的属性信息来进一步确定出对识别语义服务对象以及服务类型有关的信息类别。其中,在本公开的一个实施例中,可以通过表格特征向量来体现对应的表格属性信息,在本实施例中,根据分词标签单元集合确定与表格属性信息对应的表格特征向量,比如,可以提取分词标签单元集合对应的每个信息类别下对应的行分词和列分词组成的语义分词对的向量,根据语义分词对的向量可以利用多层感知器等提取语义分词对的头向量和尾向量/>其中,在本实施例中,头向量/>和尾向量/>的提取公式为公式(1)所示,其中,在公式(1)中,/> 表示实数域,d是特征向量的维度,In some possible embodiments, it can be understood that since the table construction corresponding to the word segmentation label unit set has some table attribute information, for example, when the row segmentation and column segmentation components are the same, the component segmentations are usually on the diagonal of the table. For example, the information categories related to the semantic service object are usually symmetrically distributed along the diagonal of the table. Therefore, the attribute information of this table is combined to further determine the information categories related to the identification of the semantic service object and the service type. Among them, in one embodiment of the present disclosure, the corresponding table attribute information can be reflected by a table feature vector. In this embodiment, the table feature vector corresponding to the table attribute information is determined according to the word segmentation label unit set. For example, the vector of the semantic segmentation pair composed of the row segmentation and the column segmentation corresponding to each information category corresponding to the word segmentation label unit set can be extracted. According to the vector of the semantic segmentation pair, the head vector of the semantic segmentation pair can be extracted using a multi-layer perceptron or the like. and tail vector/> In this embodiment, the head vector and tail vector/> The extraction formula is shown in formula (1), where, in formula (1), /> represents the real number field, d is the dimension of the eigenvector,

进而,将语义分词对的头向量和尾向量组合为一个组合向量,将所有语义分词的组合向量作为对应的表格特征向量,由于该表格特征向量与信息类别标签有关系,而可挖掘到对应的第一服务类型或者是语义服务对象的信息类别标签是相对确定的,比如,可以通过“名词-名词”以及“动词-名词”信息类别确定出对应的语义服务对象等,而这些信息类别对应的语义分词得到的表格特征向量是相对一致的,因此,可通过表格特征向量将对应信息类别所在的表格分割单元筛选出来,每个表格分割单元对应的信息标签属于一个大类,可以互相配合确定出对应的语义服务对象或者是第一服务类型等,因此,根据表格特征向量确定分词标签单元集合的表格分割位置,根据表格分割位置对分词标签进行分割处理,以获取多个表格分割单元,其中,每个表格分割单元对应的信息类别标签可更好的确定对应的语义分词是否可确定出对应的语义服务对象或者是第一服务类型。Furthermore, the head vector and the tail vector of the semantic word segmentation pair are combined into a combined vector, and the combined vectors of all semantic word segmentations are used as the corresponding table feature vector. Since the table feature vector is related to the information category label, the information category label that can be mined to correspond to the first service type or the semantic service object is relatively certain. For example, the corresponding semantic service object can be determined by the "noun-noun" and "verb-noun" information categories, and the table feature vectors obtained by the semantic word segmentation corresponding to these information categories are relatively consistent. Therefore, the table segmentation unit where the corresponding information category is located can be screened out through the table feature vector. The information label corresponding to each table segmentation unit belongs to a large category, and can cooperate with each other to determine the corresponding semantic service object or the first service type, etc. Therefore, the table segmentation position of the word segmentation label unit set is determined according to the table feature vector, and the word segmentation label is segmented according to the table segmentation position to obtain multiple table segmentation units, wherein the information category label corresponding to each table segmentation unit can better determine whether the corresponding semantic word segmentation can determine the corresponding semantic service object or the first service type.

比如,如图6所示,继续以图5所示场景为例,可以根据表格特征向量将表格换分为6个表格分割单元,每个表格分割单元中的信息类别属于一个大类,其中,比如,第一个表格分割单元下的信息类别标签属于一个大类,均与确定服务类型和语义服务对象无关。For example, as shown in Figure 6, continuing with the scenario shown in Figure 5, the table can be divided into 6 table segmentation units according to the table feature vector, and the information category in each table segmentation unit belongs to a large category, where, for example, the information category label under the first table segmentation unit belongs to a large category, which is irrelevant to determining the service type and semantic service object.

进一步地,可根据每个表格分割单元中包含的信息类别标签确定每个表格分割单元中属于所述语义服务对象的第一预测分值,以及属于第一服务类型第二预测分值,以便于根据第一预测分值和第二预测分值确定语义服务对象和第一服务类型,由于本实施例中语义服务对象和第一服务类型的确定不但以表格分割单元为粒度确定,提升了确定的精细化程度,其中,每个表格分割单元中包含了互相属于一个大类的信息标签,而且,每个表格分割单元中体现对应的语义分词之间的信息类别,进一步提升了确定语义服务对象和第一服务类型的精细化程度。Furthermore, the first prediction score belonging to the semantic service object in each table segmentation unit and the second prediction score belonging to the first service type can be determined according to the information category label contained in each table segmentation unit, so as to determine the semantic service object and the first service type according to the first prediction score and the second prediction score. Since the determination of the semantic service object and the first service type in this embodiment is not only determined at the granularity of the table segmentation unit, the degree of refinement of the determination is improved, wherein each table segmentation unit contains information labels that belong to a large category, and each table segmentation unit reflects the information category between the corresponding semantic word segmentations, which further improves the degree of refinement of the determination of the semantic service object and the first service type.

其中,在一些可能的实施例中,可确定每个所述表格分割单元中包含的信息类别标签中属于语义服务对象的第一预设信息类别的第一数量,根据第一数量确定第一预测分值,确定每个表格分割单元中包含的信息类别标签中属于服务类型的第二预设信息类别的第二数量,根据第二数量确定第二预测分值;在一些可能的实施例中,可将每个表格分割单元中对应的表格特征向量输入预设的卷积神经网络,以得到每个表格分割单元中包含的信息类别标签确定每个表格分割单元中属于语义服务对象的第一预测分值,以及属于第一服务类型第二预测分值,进而,根据第一预测分值和第二预测分支确定语义服务对象和第一服务类型,比如,若是第一预测分值大于第一预设分值,则可以将表格分割单元对应的语义分词中的名词作为第一服务对象,若是第二预测分值大于第二预设分值,则识别对应的表格分割单元中对应的语义分词中的动词对应的第一候选服务类型,以及对应的语义分词中的名词对应的第二候选服务类型,将第一候选服务类型以及第二候选服务类型的交集作为第一服务类型,比如,当根据表格分割单元中对应的语义分词中的动词“播放”确定的第一候选服务类型为“歌曲”和“视频”,且对应的语义分词中的名词“匆匆那年”对应的第二候选服务类型为“歌曲”和“视频”,则确定第一服务类型为“歌曲”和“视频”。Among them, in some possible embodiments, a first number of first preset information categories belonging to the semantic service object in the information category label contained in each of the table segmentation units can be determined, and a first prediction score can be determined according to the first number, and a second number of second preset information categories belonging to the service type in the information category label contained in each table segmentation unit can be determined, and a second prediction score can be determined according to the second number; in some possible embodiments, the corresponding table feature vector in each table segmentation unit can be input into a preset convolutional neural network to obtain the information category label contained in each table segmentation unit to determine the first prediction score belonging to the semantic service object in each table segmentation unit, and the second prediction score belonging to the first service type, and then, the semantic service object and the first service type are determined according to the first prediction score and the second prediction branch. Type, for example, if the first prediction score is greater than the first preset score, the noun in the semantic segmentation corresponding to the table segmentation unit can be used as the first service object, and if the second prediction score is greater than the second preset score, the first candidate service type corresponding to the verb in the corresponding semantic segmentation in the corresponding table segmentation unit and the second candidate service type corresponding to the noun in the corresponding semantic segmentation are identified, and the intersection of the first candidate service type and the second candidate service type is used as the first service type. For example, when the first candidate service type determined according to the verb "play" in the corresponding semantic segmentation in the table segmentation unit is "song" and "video", and the second candidate service type corresponding to the noun "fleeting years" in the corresponding semantic segmentation is "song" and "video", the first service type is determined to be "song" and "video".

综上,本公开实施例的语音处理方法,根据语音控制指令进行语义识别以得到目标语义识别结果,识别目标语义识别结果中的包含的多个语义分词,基于多个语义分词之间的信息类别来确定语义服务对象以及语义服务对象的第一服务类型,提升了语义服务对象以及语义服务对象的第一服务类型确定精细化,保证了语义服务对象以及语义服务对象的第一服务类型确定的精确度。为了实现上述实施例,本公开还提出了一种语音处理装置。In summary, the speech processing method of the embodiment of the present disclosure performs semantic recognition according to the speech control instruction to obtain the target semantic recognition result, identifies multiple semantic participles contained in the target semantic recognition result, and determines the semantic service object and the first service type of the semantic service object based on the information category between the multiple semantic participles, thereby improving the refinement of the determination of the semantic service object and the first service type of the semantic service object, and ensuring the accuracy of the determination of the semantic service object and the first service type of the semantic service object. In order to implement the above embodiment, the present disclosure also proposes a speech processing device.

图7为本公开实施例提供的一种语音处理装置的结构示意图,该装置可由软件和/或硬件实现,一般可集成在电子设备中。如图7所示,该装置包括:确定模块710、匹配模块720和处理模块730,其中,FIG7 is a schematic diagram of the structure of a speech processing device provided by an embodiment of the present disclosure. The device can be implemented by software and/or hardware and can generally be integrated into an electronic device. As shown in FIG7 , the device includes: a determination module 710, a matching module 720 and a processing module 730, wherein:

确定模块710,用于响应于接收到的语音控制指令,确定与所述语音控制指令对应的语义服务对象以及所述语义服务对象的第一服务类型;A determination module 710, configured to determine, in response to a received voice control instruction, a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object;

匹配模块720,用于在所述第一服务类型为多个的情况下,确定当前处于开启状态的应用程序的第二服务类型,并将所述第二服务类型与所述第一服务类型匹配以获取匹配结果;A matching module 720, configured to determine a second service type of an application currently in an open state when there are multiple first service types, and match the second service type with the first service type to obtain a matching result;

处理模块730,用于在匹配结果为第二服务类型中存在与所述第一服务类型匹配成功的第一目标服务类型时,通过所述第一目标服务类型对应的应用程序提供与所述语义服务对象对应的语音服务。The processing module 730 is used to provide a voice service corresponding to the semantic service object through an application corresponding to the first target service type when the matching result is that there is a first target service type in the second service type that successfully matches the first service type.

在本公开的一个实施例中,确定模块710具体用于:In one embodiment of the present disclosure, the determination module 710 is specifically configured to:

对所述语音控制指令进行语义识别,以确定与所述语音控制指令对象的语义服务对象;Performing semantic recognition on the voice control instruction to determine a semantic service object corresponding to the voice control instruction object;

向预设服务器发送携带所述语义服务对象的查询请求,并获取所述预设服务器反馈的所述语义服务对象的第一服务类型。A query request carrying the semantic service object is sent to a preset server, and a first service type of the semantic service object fed back by the preset server is acquired.

本公开实施例所提供的语音处理装置可执行本公开任意实施例所提供的语音处理方法,具备执行方法相应的功能模块和有益效果。The speech processing device provided in the embodiments of the present disclosure can execute the speech processing method provided in any embodiment of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method.

为了实现上述实施例,本公开还提出一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令被处理器执行时实现上述实施例中的语音处理方法。In order to implement the above embodiments, the present disclosure further proposes a computer program product, including a computer program/instruction, which implements the speech processing method in the above embodiments when executed by a processor.

图8为本公开实施例提供的一种电子设备的结构示意图。FIG8 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present disclosure.

下面具体参考图8,其示出了适于用来实现本公开实施例中的电子设备800的结构示意图。本公开实施例中的电子设备800可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图8示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。8, which shows a schematic diagram of the structure of an electronic device 800 suitable for implementing the embodiment of the present disclosure. The electronic device 800 in the embodiment of the present disclosure may include, but is not limited to, mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc. The electronic device shown in FIG8 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present disclosure.

如图8所示,电子设备800可以包括处理器(例如中央处理器、图形处理器等)801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储器808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有电子设备800操作所需的各种程序和数据。处理器801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。As shown in Figure 8, the electronic device 800 may include a processor (e.g., a central processing unit, a graphics processing unit, etc.) 801, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 802 or a program loaded from a memory 808 into a random access memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic device 800 are also stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

通常,以下装置可以连接至I/O接口805:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置806;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置807;包括例如磁带、硬盘等的存储器808;以及通信装置809。通信装置809可以允许电子设备800与其他设备进行无线或有线通信以交换数据。虽然图8示出了具有各种装置的电子设备800,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 807 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 808 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 809. The communication device 809 may allow the electronic device 800 to communicate wirelessly or wired with other devices to exchange data. Although FIG. 8 shows an electronic device 800 with various devices, it should be understood that it is not required to implement or have all the devices shown. More or fewer devices may be implemented or have alternatively.

特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置809从网络上被下载和安装,或者从存储器808被安装,或者从ROM 802被安装。在该计算机程序被处理器801执行时,执行本公开实施例的语音处理方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, and the computer program includes a program code for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from the network through the communication device 809, or installed from the memory 808, or installed from the ROM 802. When the computer program is executed by the processor 801, the above-mentioned functions defined in the speech processing method of the embodiment of the present disclosure are executed.

需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device. In the present disclosure, a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried. This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above. The computer readable signal medium may also be any computer readable medium other than a computer readable storage medium, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device. The program code contained on the computer readable medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText TransferProtocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, the client and the server may communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), an internet (e.g., the Internet), and a peer-to-peer network (e.g., an ad hoc peer-to-peer network), as well as any currently known or future developed network.

上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.

上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:响应于接收到的语音控制指令,确定与语音控制指令对应的语义服务对象以及语义服务对象的第一服务类型,在第一服务类型为多个的情况下,确定当前处于开启状态的应用程序的第二服务类型,并将第二服务类型与第一服务类型匹配,进而,在第二服务类型中存在与第一服务类型匹配成功的第一目标服务类型的情况下,通过第一目标服务类型对应的应用程序提供与语义服务对象对应的语音服务。在本公开的实施例中,在语音控制指令对应的语音服务对象存在多个服务类型时,通过语音控制指令对应的语义服务对象与开启的应用程序的服务类型匹配,确定与语音控制指令对应的应用程序,以便于基于对应的应用程序提供语音服务,在存在多语义的情况下,可以根据开启的应用程序的服务类型在多语义中确定正确的语义,减少了与用户进行语义澄清的交互,降低了语义澄清的耗时,提升了语音处理效率。The computer-readable medium carries one or more programs. When the one or more programs are executed by the electronic device, the electronic device: in response to the received voice control instruction, determines the semantic service object corresponding to the voice control instruction and the first service type of the semantic service object. In the case where there are multiple first service types, determines the second service type of the application currently in the open state, and matches the second service type with the first service type. Then, in the case where there is a first target service type in the second service type that successfully matches the first service type, the application corresponding to the semantic service object is provided through the application corresponding to the first target service type. In an embodiment of the present disclosure, when there are multiple service types of the voice service object corresponding to the voice control instruction, the application corresponding to the voice control instruction is determined by matching the semantic service object corresponding to the voice control instruction with the service type of the opened application, so as to provide voice service based on the corresponding application. In the case of multiple semantics, the correct semantics can be determined in the multiple semantics according to the service type of the opened application, which reduces the interaction with the user for semantic clarification, reduces the time consumption of semantic clarification, and improves the efficiency of voice processing.

电子设备可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。The electronic device may be written in one or more programming languages or a combination thereof to write computer program code for performing the operations of the present disclosure, including but not limited to object-oriented programming languages, such as Java, Smalltalk, C++, and conventional procedural programming languages, such as "C" or similar programming languages. The program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).

附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flow chart and block diagram in the accompanying drawings illustrate the possible architecture, function and operation of the system, method and computer program product according to various embodiments of the present disclosure. In this regard, each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some implementations as replacements, the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments described in the present disclosure may be implemented by software or hardware, wherein the name of a unit does not, in some cases, constitute a limitation on the unit itself.

本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described above herein may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chip (SOCs), complex programmable logic devices (CPLDs), and the like.

在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing. A more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present disclosure and an explanation of the technical principles used. Those skilled in the art should understand that the scope of disclosure involved in the present disclosure is not limited to the technical solutions formed by a specific combination of the above technical features, but should also cover other technical solutions formed by any combination of the above technical features or their equivalent features without departing from the above disclosed concept. For example, the above features are replaced with the technical features with similar functions disclosed in the present disclosure (but not limited to) by each other.

此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。In addition, although each operation is described in a specific order, this should not be understood as requiring these operations to be performed in the specific order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Similarly, although some specific implementation details are included in the above discussion, these should not be interpreted as limiting the scope of the present disclosure. Some features described in the context of a separate embodiment can also be implemented in a single embodiment in combination. On the contrary, the various features described in the context of a single embodiment can also be implemented in multiple embodiments individually or in any suitable sub-combination mode.

尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or methodological logical actions, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or actions described above. On the contrary, the specific features and actions described above are merely example forms of implementing the claims.

Claims (10)

1.一种语音处理方法,其特征在于,包括以下步骤:1. A speech processing method, characterized in that it comprises the following steps: 响应于接收到的语音控制指令,确定与所述语音控制指令对应的语义服务对象以及所述语义服务对象的第一服务类型;In response to the received voice control instruction, determining a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object; 在所述第一服务类型为多个的情况下,确定当前处于开启状态的应用程序的第二服务类型,并将所述第二服务类型与所述第一服务类型匹配以获取匹配结果;In the case where there are multiple first service types, determining a second service type of an application currently in an open state, and matching the second service type with the first service type to obtain a matching result; 当所述匹配结果为所述第二服务类型中存在与所述第一服务类型匹配成功的第一目标服务类型时,通过所述第一目标服务类型对应的应用程序提供与所述语义服务对象对应的语音服务。When the matching result is that there is a first target service type in the second service type that successfully matches the first service type, a voice service corresponding to the semantic service object is provided through an application corresponding to the first target service type. 2.如权利要求1所述的方法,其特征在于,所述确定与所述语音控制指令对应的语义服务对象以及所述语义服务对象的第一服务类型,包括:2. The method according to claim 1, wherein determining the semantic service object corresponding to the voice control instruction and the first service type of the semantic service object comprises: 根据所述语音控制指令进行语义识别以得到目标语义识别结果;Performing semantic recognition according to the voice control instruction to obtain a target semantic recognition result; 识别所述目标语义识别结果中的包含的多个语义分词;Identify multiple semantic segmentations included in the target semantic recognition result; 根据所述多个语义分词构建分词标签单元集合,其中,所述分词标签单元集合为包含行分词和列分词组成的单元集合,其中,所述行分词和列分词,按照所述多个语义分词在所述目标语义识别结果中的分词顺序进行相同设置得到;Constructing a word segmentation label unit set according to the multiple semantic word segmentations, wherein the word segmentation label unit set is a unit set consisting of row word segmentations and column word segmentations, wherein the row word segmentations and column word segmentations are obtained by setting the same order of the word segmentations of the multiple semantic word segmentations in the target semantic recognition result; 对所述分词标签单元集合中由所述行分词和列分词组成的各子单元标注对应的信息类别标签;Labeling corresponding information category labels for each subunit consisting of the row segmentation and the column segmentation in the segmentation label unit set; 根据所述分词标签单元集合中的所述信息类别标签确定所述语义服务对象和所述语义服务对象的第一服务类型。The semantic service object and the first service type of the semantic service object are determined according to the information category label in the word segmentation label unit set. 3.如权利要求2所述的方法,其特征在于,所述对所述分词标签单元集合中由所述行分词和列分词组成的各子单元标注对应的信息类别标签,包括:3. The method according to claim 2, characterized in that the step of labeling each sub-unit composed of the row segmentation and the column segmentation in the segmentation label unit set with a corresponding information category label comprises: 确定所述分词标签单元集合中每个子单元对应的行分词的第一分词属性以及列分词的第二分词属性;Determine a first segmentation attribute of a row segmentation and a second segmentation attribute of a column segmentation corresponding to each subunit in the segmentation tag unit set; 将所述第一分词属性和所述第二分词属性输入预设的分词关系抽取模型,以获取对应的子单元的信息类别标签。The first word segmentation attribute and the second word segmentation attribute are input into a preset word segmentation relationship extraction model to obtain information category labels of corresponding sub-units. 4.如权利要求2或3所述的方法,其特征在于,所述根据所述分词标签单元集合中的所述信息类别标签确定所述语义服务对象和所述语义服务对象的第一服务类型,包括:4. The method according to claim 2 or 3, characterized in that the step of determining the semantic service object and the first service type of the semantic service object according to the information category label in the word segmentation label unit set comprises: 识别所述分词标签单元集合的表格属性信息,根据所述分词标签单元集合确定与所述表格属性信息对应的表格特征向量;根据所述表格特征向量确定所述分词标签单元集合的表格分割位置,根据所述表格分割位置对所述分词标签进行分割处理,以获取多个表格分割单元;Identify table attribute information of the word segmentation label unit set, determine a table feature vector corresponding to the table attribute information according to the word segmentation label unit set; determine a table segmentation position of the word segmentation label unit set according to the table feature vector, and segment the word segmentation label according to the table segmentation position to obtain a plurality of table segmentation units; 根据每个所述表格分割单元中包含的信息类别标签确定每个所述表格分割单元中属于所述语义服务对象的第一预测分值,以及属于所述第一服务类型第二预测分值;Determine, according to the information category label contained in each of the table segmentation units, a first prediction score belonging to the semantic service object and a second prediction score belonging to the first service type in each of the table segmentation units; 分别根据所述第一预测分值和所述第二预测分值确定所述语义服务对象和所述第一服务类型。The semantic service object and the first service type are determined according to the first prediction score and the second prediction score, respectively. 5.如权利要求1所述的方法,其特征在于,在所述将所述第二服务类型与所述第一服务类型匹配以获取匹配结果之后,还包括:5. The method according to claim 1, characterized in that after matching the second service type with the first service type to obtain a matching result, it further comprises: 当所述匹配结果为所述第二服务类型中不存在与所述第一服务类型匹配成功的第一目标服务类型时,根据多个所述第一服务类型生成语义澄清提示信息;When the matching result is that there is no first target service type in the second service type that successfully matches the first service type, generating semantic clarification prompt information according to a plurality of the first service types; 根据所述语义澄清提示信息进行语义澄清提示处理。A semantic clarification prompt process is performed according to the semantic clarification prompt information. 6.如权利要求5所述的方法,其特征在于,还包括:6. The method according to claim 5, further comprising: 响应于在预设时长内接收到根据所述语义澄清提示信息输入的第二目标服务类型,确定与所述第二目标服务类型对应的应用程序;In response to receiving a second target service type input according to the semantic clarification prompt information within a preset time period, determining an application corresponding to the second target service type; 识别车辆的当前驾驶参数信息,根据所述当前驾驶参数信息确定当前驾驶安全等级,确定所述第二目标服务类型对应的应用程序的程序安全等级;Identifying current driving parameter information of the vehicle, determining a current driving safety level according to the current driving parameter information, and determining a program safety level of an application corresponding to the second target service type; 在所述程序安全等级与所述当前驾驶安全等级匹配时,通过所述第二目标服务类型对应的应用程序提供与所述语义服务对象对应的语音服务。When the program safety level matches the current driving safety level, a voice service corresponding to the semantic service object is provided through an application corresponding to the second target service type. 7.如权利要求5或6所述的方法,其特征在于,还包括:7. The method according to claim 5 or 6, further comprising: 响应于在所述预设时长内未接收到根据所述语义澄清提示信息输入的第二目标服务类型,确定每个所述第一服务类型的热度信息;In response to not receiving the second target service type input according to the semantic clarification prompt information within the preset time period, determining popularity information of each of the first service types; 根据所述热度信息在多个所述第一服务类型中确定第三目标服务类型,并确定所述第三目标服务类型对应的应用程序;Determine a third target service type from the plurality of first service types according to the popularity information, and determine an application corresponding to the third target service type; 开启所述第三目标服务类型对应的应用程序,以通过所述第三目标服务类型对应的应用程序提供与所述语义服务对象对应的语音服务。An application corresponding to the third target service type is started to provide a voice service corresponding to the semantic service object through the application corresponding to the third target service type. 8.一种语音处理装置,其特征在于,包括:8. A speech processing device, comprising: 确定模块,用于响应于接收到的语音控制指令,确定与所述语音控制指令对应的语义服务对象以及所述语义服务对象的第一服务类型;A determination module, configured to determine, in response to a received voice control instruction, a semantic service object corresponding to the voice control instruction and a first service type of the semantic service object; 匹配模块,用于在所述第一服务类型为多个的情况下,确定当前处于开启状态的应用程序的第二服务类型,并将所述第二服务类型与所述第一服务类型匹配以获取匹配结果;a matching module, configured to determine, when there are multiple first service types, a second service type of an application currently in an open state, and match the second service type with the first service type to obtain a matching result; 处理模块,用于在所述匹配结果为所述第二服务类型中存在与所述第一服务类型匹配成功的第一目标服务类型时,通过所述第一目标服务类型对应的应用程序提供与所述语义服务对象对应的语音服务。A processing module is used to provide a voice service corresponding to the semantic service object through an application corresponding to the first target service type when the matching result is that there is a first target service type in the second service type that successfully matches the first service type. 9.一种电子设备,其特征在于,所述电子设备包括:9. An electronic device, characterized in that the electronic device comprises: 处理器;用于存储所述处理器可执行指令的存储器;A processor; a memory for storing instructions executable by the processor; 所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述可执行指令以实现上述权利要求1-7中任一所述的语音处理方法。The processor is used to read the executable instructions from the memory and execute the executable instructions to implement the speech processing method described in any one of claims 1-7. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-7中任一所述的语音处理方法。10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and the computer program is used to execute the speech processing method described in any one of claims 1 to 7.
CN202211550685.8A 2022-12-05 2022-12-05 Voice processing method, device, equipment and medium Pending CN118155613A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211550685.8A CN118155613A (en) 2022-12-05 2022-12-05 Voice processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211550685.8A CN118155613A (en) 2022-12-05 2022-12-05 Voice processing method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN118155613A true CN118155613A (en) 2024-06-07

Family

ID=91297622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211550685.8A Pending CN118155613A (en) 2022-12-05 2022-12-05 Voice processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN118155613A (en)

Similar Documents

Publication Publication Date Title
CN108985358B (en) Emotion recognition method, device, equipment and storage medium
CN114424185A (en) Stop word data augmentation for natural language processing
US11830482B2 (en) Method and apparatus for speech interaction, and computer storage medium
CN115398436A (en) Noise data augmentation for natural language processing
CN110263142A (en) Method and apparatus for output information
WO2021135319A1 (en) Deep learning based text generation method and apparatus and electronic device
CN110969012A (en) Text error correction method and device, storage medium and electronic equipment
CN116583837A (en) Distance-based LOGIT values for natural language processing
US20250118338A1 (en) Methods, devices, readable media and electronic devices for video processing
US20240079002A1 (en) Minutes of meeting processing method and apparatus, device, and medium
WO2023279843A1 (en) Content search method, apparatus and device, and storage medium
CN113486170B (en) Natural language processing method, device, equipment and medium based on man-machine interaction
CN111428011B (en) Word recommendation method, device, equipment and storage medium
CN111444321B (en) Question answering method, device, electronic equipment and storage medium
CN118964694A (en) Knowledge question answering method, device, readable medium, electronic device and program product
WO2024179519A1 (en) Semantic recognition method and apparatus
US20240256219A1 (en) Speech processing and multi-modal widgets
CN114298007A (en) A text similarity determination method, device, equipment and medium
CN118964693A (en) Knowledge question answering method, device, readable medium, electronic device and program product
CN112069786A (en) Text information processing method and device, electronic equipment and medium
CN118155613A (en) Voice processing method, device, equipment and medium
US11769487B2 (en) Systems and methods for voice topic spotting
WO2021170094A1 (en) Method and device for information interaction
CN110502630B (en) Information processing method and device
CN120012776B (en) Content security identification method based on integration of multiple large language models

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination