CN116600172A

CN116600172A - Video playing method and device, storage medium and electronic device

Info

Publication number: CN116600172A
Application number: CN202310331423.0A
Authority: CN
Inventors: 宗绪
Original assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd; Haier Uplus Intelligent Technology Beijing Co Ltd
Current assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd; Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-08-15

Abstract

The application discloses a video playing method, a device, a storage medium and an electronic device, and relates to the technical field of smart families, wherein the video playing method comprises the following steps: receiving first voice interaction information sent by first electronic equipment, and recognizing interaction content of the first voice interaction information to generate a recognition result; under the condition that a preset triggering condition is met, first assistance information is sent to the second electronic equipment; receiving a target assistance response sent by the second electronic equipment based on the recognition result of the interactive content contained in the first voice interactive information by the user; and pushing the first target video related to the first interactive content to the target electronic equipment based on the first interactive content contained in the target assistance response. The video playing method, the video playing device, the storage medium and the electronic device are used for improving the recognition accuracy of voice instructions of the old so as to meet the use requirement of the old on the intelligent television.

Description

Video playing method and device, storage medium and electronic device

Technical Field

The present application relates to the technical field of smart home, and in particular, to a video playing method, a video playing device, a storage medium, and an electronic device.

Background

With the acceleration of population aging, millions of elderly population are newly increased in China every year, and many elderly people have the conditions of vague vomiting, reverse speaking order and dialect speaking, so that the demands of the elderly people cannot be well expressed.

In the related art, when the old people use the voice interaction function of the intelligent television to select video programs, based on the speaking characteristics of the old people, the system can hardly accurately identify the voice instruction of the old people, and further, the use requirement of the old people on the intelligent television can not be met.

Based on the above, a technical scheme capable of solving the problem that the voice command cannot be accurately identified in the process of using the intelligent television by the old people is urgently needed, so as to meet the use requirement of the old people on the intelligent television.

Disclosure of Invention

The application aims to provide a video playing method, a video playing device, a storage medium and an electronic device, which are used for improving the recognition accuracy of voice instructions of the elderly so as to meet the use requirements of the elderly on intelligent televisions.

The application provides a video playing method, which comprises the following steps:

receiving first voice interaction information sent by first electronic equipment, and identifying interaction content of the first voice interaction information to generate an identification result; under the condition that a preset triggering condition is met, first assistance information is sent to the second electronic equipment; the first assistance information is used for indicating a user of the second electronic equipment to identify interactive contents contained in the first voice interactive information; receiving a target assistance response sent by the second electronic equipment; the target assistance response is sent by the second electronic equipment based on the recognition result of the user on the interactive content contained in the first voice interactive information; pushing a first target video related to the first interactive content to target electronic equipment based on the first interactive content contained in the target assistance response; wherein the preset trigger condition includes any one of the following: the recognition result indicates that the interactive content contained in the first voice interaction information cannot be recognized, the recognition result indicates that the interactive content contained in the first voice interaction information is recognized, and voice interaction information similar to the first voice interaction information in the preset times is received within the preset time.

Optionally, the pushing, based on the first interactive content included in the target assistance response, a first target video related to the first interactive content to a target electronic device includes: acquiring the first interactive content, and inquiring a first target video related to the first interactive content from a video resource library based on the first interactive content; and sending the video resource address of the first target video to the target electronic equipment, so that the target electronic equipment plays the first target video based on the video resource address of the first target video.

Optionally, the second electronic device is one of a plurality of assisting devices; under the condition that the preset triggering condition is met, sending first assistance information to the second electronic equipment, wherein the first assistance information comprises: and sending the first assistance information to each electronic device in the plurality of assistance devices under the condition that the identification result indicates that the interactive content contained in the first voice interaction information cannot be identified.

Optionally, the second electronic device is one of a plurality of assisting devices; under the condition that the preset triggering condition is met, sending first assistance information to the second electronic equipment, wherein the first assistance information comprises: determining the number of times of receiving the voice interaction information with the similarity meeting the preset similarity with the first voice interaction information in the preset time based on the voice recognition result; transmitting the first assistance information to each electronic device in the plurality of assistance devices when the recognition result indicates that the interaction content contained in the first voice interaction information is recognized and the information times meet a preset threshold; or if the identification result indicates that the interactive content contained in the first voice interactive information cannot be identified, and the number of times of information meets a preset threshold, sending the first assistance information to each electronic device in the plurality of assistance devices.

Optionally, the receiving the target assistance response sent by the second electronic device includes: receiving assistance responses sent by each electronic device in the plurality of assistance devices based on the recognition results of the interaction content contained in the first voice interaction information by the user, and sequencing the assistance responses sent by the plurality of assistance devices according to the receiving time; and determining the assistance response with the earliest receiving time as the target assistance response.

Optionally, after the pushing, based on the first interactive content included in the target assistance response, a first target video related to the first interactive content to a target electronic device, the method further includes: if second voice interaction information sent by the first electronic device is received in a preset time after the second voice interaction information is pushed to the target electronic device and the first target video is pushed to the target electronic device, judging whether the second voice interaction information is consistent with interaction content contained in the first voice interaction information or not based on voice recognition; acquiring an assistance response sent by a third electronic device under the condition that the second voice interaction information is consistent with the interaction content contained in the first voice interaction information; ordering the assistance responses sent by each electronic device in the third electronic device based on the assistance weight of each electronic device in the third electronic device; pushing a second target video related to the second interactive content to the target electronic equipment based on the second interactive content contained in the assistance response sent by the target electronic equipment with highest assistance weight in the third electronic equipment; the third electronic device is other auxiliary devices except the second electronic device in the plurality of auxiliary devices; and the assistance response sent by the third electronic equipment is different from the interactive content contained in the target assistance response.

Optionally, the determining, based on the voice recognition, whether the interactive content included in the second voice interaction information and the first voice interaction information are consistent includes: acquiring first interactive voice in the first voice interaction information and second interactive voice in the second voice interaction information; judging whether the interactive content indicated by the first interactive voice is consistent with the interactive content indicated by the first interactive voice or not based on the sound characteristic information of the first interactive voice and the second interactive voice; if the first interactive voice is consistent with the interactive content indicated by the first interactive voice, determining that the second voice interactive information is consistent with the interactive content contained in the first voice interactive information, otherwise, determining that the second voice interactive information is inconsistent with the interactive content contained in the first voice interactive information.

The application also provides a video playing device, which comprises:

the receiving module is used for receiving first voice interaction information sent by the first electronic equipment; the content recognition module is used for recognizing the interactive content of the first voice interactive information and generating a recognition result; the assistance request module is used for sending first assistance information to the second electronic equipment under the condition that a preset triggering condition is met; the first assistance information is used for indicating a user of the second electronic equipment to identify interactive contents contained in the first voice interactive information; the assistance response module is used for receiving a target assistance response sent by the second electronic equipment; the target assistance response is sent by the second electronic equipment based on the recognition result of the user on the interactive content contained in the first voice interactive information; the content pushing module is used for pushing a first target video related to the first interactive content to target electronic equipment based on the first interactive content contained in the target assistance response; wherein the preset trigger condition includes any one of the following: the recognition result indicates that the interactive content contained in the first voice interaction information cannot be recognized, the recognition result indicates that the interactive content contained in the first voice interaction information is recognized, and voice interaction information similar to the first voice interaction information in the preset times is received within the preset time.

Optionally, the apparatus further comprises: an acquisition module; the acquisition module is used for acquiring the first interactive content and inquiring a first target video related to the first interactive content from a video resource library based on the first interactive content; the content pushing module is specifically configured to send a video resource address of the first target video to the target electronic device, so that the target electronic device plays the first target video based on the video resource address of the first target video.

Optionally, the second electronic device is one of a plurality of assisting devices; the assistance request module is specifically configured to send the first assistance information to each electronic device in the plurality of assistance devices when the recognition result indicates that the interactive content included in the first voice interaction information cannot be recognized.

Optionally, the apparatus further comprises: a determining module; the determining module is used for determining the number of times of receiving the voice interaction information, the similarity of which with the first voice interaction information meets the preset similarity, in the preset duration based on the voice recognition result; the assistance request module is specifically configured to send the first assistance information to each electronic device in the plurality of assistance devices when the recognition result indicates that the interaction content included in the first voice interaction information is recognized and the number of times of information meets a preset threshold; the assistance request module is specifically further configured to send the first assistance information to each electronic device in the plurality of assistance devices when the recognition result indicates that the interaction content included in the first voice interaction information cannot be recognized and the number of times of information meets a preset threshold.

Optionally, the assistance response module is specifically configured to receive assistance responses sent by each electronic device in the plurality of assistance devices based on a recognition result of the user on the interactive content included in the first voice interaction information, and order the assistance responses sent by the plurality of assistance devices according to a receiving time; the assistance response module is specifically further configured to determine an assistance response with the earliest receiving time as the target assistance response.

Optionally, the apparatus further comprises: a sequencing module; the determining module is further configured to determine, if second voice interaction information sent by the first electronic device is received within a preset time after the second voice interaction information is pushed to the target electronic device and the first target video, whether interaction content included in the second voice interaction information and the first voice interaction information is consistent based on voice recognition; the acquisition module is further configured to acquire an assistance response sent by a third electronic device when the second voice interaction information is consistent with the interaction content included in the first voice interaction information; the ordering module is used for ordering the assistance response sent by each electronic device in the third electronic device based on the assistance weight of each electronic device in the third electronic device; the content pushing module is further configured to push, to a target electronic device, a second target video related to the second interactive content based on the second interactive content included in the assistance response sent by the target electronic device with the highest assistance weight in the third electronic device; the third electronic device is other auxiliary devices except the second electronic device in the plurality of auxiliary devices; and the assistance response sent by the third electronic equipment is different from the interactive content contained in the target assistance response.

Optionally, the acquiring module is further configured to acquire a first interactive voice in the first voice interaction information and a second interactive voice in the second voice interaction information; the determining module is specifically configured to determine, based on sound feature information of the first interactive voice and the second interactive voice, whether interactive content indicated by the first interactive voice is consistent with interactive content indicated by the first interactive voice; the determining module is specifically further configured to determine that the second voice interaction information is consistent with the interaction content included in the first voice interaction information if the first interaction voice is consistent with the interaction content indicated by the first interaction voice, or determine that the second voice interaction information is inconsistent with the interaction content included in the first voice interaction information if the second voice interaction information is not consistent with the interaction content indicated by the first interaction voice.

The application also provides an electronic device comprising a memory in which a computer program is stored and a processor arranged to perform the steps of implementing the video playing method as described in any of the above by means of the computer program.

The present application also provides a computer-readable storage medium comprising a stored program, wherein the program when executed implements the steps of the video playback method as described in any one of the above.

The application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of a video playback method as described in any one of the above.

According to the video playing method, the video playing device, the storage medium and the electronic device, firstly, the cloud platform receives first voice interaction information sent by the first electronic equipment, and carries out recognition of interaction content on the first voice interaction information to generate a recognition result. And then, the cloud platform sends the first assistance information to the second electronic equipment under the condition that the preset triggering condition is met. Then, a target assistance response sent by the second electronic device based on the recognition result of the interactive content contained in the first voice interactive information by the user is received. Finally, pushing the first target video related to the first interactive content to the first electronic device based on the first interactive content contained in the target assistance response. Therefore, the accuracy of the voice instruction recognition of the old can be improved to a certain extent, and the use requirement of the old on the intelligent television is further met.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the application or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a hardware environment of an interaction method of a smart device according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a video playing method provided by the present application;

fig. 3 is a schematic diagram of a system architecture to which the video playing method provided by the present application is applied;

fig. 4 is a schematic structural diagram of a video playing device provided by the present application;

fig. 5 is a schematic structural diagram of an electronic device provided by the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of an embodiment of the present application, a video playing method is provided. The video playing method is widely applied to full-house intelligent digital control application scenes such as intelligent Home (Smart Home), intelligent Home equipment ecology, intelligent Home (Intelligence House) ecology and the like. Alternatively, in the present embodiment, the video playing method described above may be applied to a hardware environment constituted by the terminal device 102 and the server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal device 102 through a network, and may be used to provide services (such as application services and the like) for a terminal or a client installed on the terminal, a database may be set on the server or independent of the server, for providing data storage services for the server 104, and cloud computing and/or edge computing services may be configured on the server or independent of the server, for providing data computing services for the server 104.

The network may include, but is not limited to, at least one of: wired network, wireless network. The wired network may include, but is not limited to, at least one of: a wide area network, a metropolitan area network, a local area network, and the wireless network may include, but is not limited to, at least one of: WIFI (Wireless Fidelity ), bluetooth. The terminal device 102 may not be limited to a PC, a mobile phone, a tablet computer, an intelligent air conditioner, an intelligent smoke machine, an intelligent refrigerator, an intelligent oven, an intelligent cooking range, an intelligent washing machine, an intelligent water heater, an intelligent washing device, an intelligent dish washer, an intelligent projection device, an intelligent television, an intelligent clothes hanger, an intelligent curtain, an intelligent video, an intelligent socket, an intelligent sound box, an intelligent fresh air device, an intelligent kitchen and toilet device, an intelligent bathroom device, an intelligent sweeping robot, an intelligent window cleaning robot, an intelligent mopping robot, an intelligent air purifying device, an intelligent steam box, an intelligent microwave oven, an intelligent kitchen appliance, an intelligent purifier, an intelligent water dispenser, an intelligent door lock, and the like.

In the related art, an acoustic model can be adopted for identifying the speaking characteristics of the aged, but the mode needs to train the model independently for the speaking characteristics of the aged in different areas and with different dialects, so that the cost is huge and the effect is poor.

Aiming at the technical problems in the related art, the embodiment of the application provides a video playing method, which can request the relatives and friends to carry out auxiliary identification under the condition that the system cannot identify the voice instructions of the old, and can greatly improve the identification accuracy of the system to the voice instructions of the old by the aid of the relatives and friends auxiliary identification, thereby improving the use experience of the old to the intelligent television.

The video playing method provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

As shown in fig. 2, the video playing method provided by the embodiment of the present application is applied to a cloud platform, and the method may include the following steps 201 to 204:

step 201, receiving first voice interaction information sent by a first electronic device, and identifying interaction content of the first voice interaction information to generate an identification result.

The first voice interaction information includes an interaction voice between a user using the first electronic device and the first electronic device. The first electronic device may be a mobile terminal such as a mobile phone or a tablet.

The cloud platform may analyze the first voice interaction information sent by the first electronic device after receiving the first voice interaction information to obtain an interaction voice, and identify the interaction voice based on a voice identification technology to generate a corresponding identification result.

For example, in the case where the user using the first electronic device is an elderly user, based on the speaking characteristics of the elderly user, the cloud platform may have difficulty in identifying the specific interactive content indicated by the interactive voice.

Step 202, sending first assistance information to the second electronic device under the condition that a preset trigger condition is met.

The first assistance information is used for indicating a user of the second electronic device to identify interactive contents contained in the first voice interactive information; the preset trigger condition comprises any one of the following: the recognition result indicates that the interactive content contained in the first voice interaction information cannot be recognized, the recognition result indicates that the interactive content contained in the first voice interaction information is recognized, and voice interaction information similar to the first voice interaction information in the preset times is received within the preset time.

For example, if at least one of the above preset trigger conditions is met, an assistance request may be sent to a second electronic device that establishes an association with the first electronic device, where the user of the second electronic device is requested to assist in identifying interactive content of the interactive voice instruction included in the first voice interaction information. The second electronic device may also be a mobile terminal such as a mobile phone or a tablet.

For example, as shown in fig. 3, to enable the method in step 202, corresponding application programs may be installed on the first electronic device and the second electronic device, respectively, to establish a communication connection between the first electronic device and the cloud platform, and to establish a communication connection between the second electronic device and the cloud platform.

And 203, receiving a target assistance response sent by the second electronic equipment.

The target assistance response is sent by the second electronic device based on the recognition result of the user on the interactive content contained in the first voice interactive information.

Exemplary, the first assistance information includes the interactive voice; after receiving the first assistance information, the second electronic device can remind a user using the second electronic device and play the interactive voice to the user. The user can recognize the interactive voice and feed back the correct interactive content to the second electronic device.

The second electronic device may send the feedback information to the cloud platform, that is, the target assistance response, after acquiring the feedback information of the user.

Step 204, pushing a first target video related to the first interactive content to a target electronic device based on the first interactive content contained in the target assistance response.

In an exemplary embodiment, after receiving a target assistance response sent by the second electronic device, the cloud platform may push, to the target electronic device, a first target video related to the first interactive content based on the first interactive content carried in the target assistance response.

The target electronic device may be an intelligent home appliance, such as a refrigerator, a television, or the like, which has a display device and is capable of playing video content. The target electronic device and the first electronic device may be the same electronic device or different electronic devices.

In an exemplary case where the target electronic device and the first electronic device are the same electronic device, the first electronic device may be an electronic device that has a display device such as a refrigerator or a television and is capable of playing video content. The first target video may be one or more videos.

For example, taking the first electronic device as a mobile phone 1, the second electronic device as a mobile phone 2, and the target electronic device as smart phones as examples, when the elderly user a searches for video content by using the mobile phone 1 and wishes to screen the searched video content on the smart television for playing, an application program installed on the mobile phone 1 may be used to perform voice search, and the mobile phone 1 may send the interactive voice of the user a to the cloud platform for recognition. After the cloud platform receives the interactive voice sent by the mobile phone 1, if the interactive content contained in the interactive voice cannot be identified, the cloud platform can request the user B to assist in a mode of sending assistance information to the mobile phone 2. The user B and the user A can be related or friend relationship, the speaking specific of the user A is familiar, the real intention of the user A can be accurately identified, and the real intention is fed back to the cloud platform through the mobile phone 2. For example, if the user a wants to see a video related to "square dance", but the cloud platform cannot recognize the real intention of the user a through its interactive voice, the interactive voice may be forwarded to the user B, and after recognizing the real intention of the user a, the user B feeds back to the cloud platform. And then, the cloud platform retrieves the related video (namely the first target video) from the content library and pushes the related video to the intelligent television.

Alternatively, in the embodiment of the present application, the cloud platform may push the first target video to the target device in the following manner.

Specifically, the step 204 may include the following steps 204a1 and 204a2:

step 204a1, obtaining the first interactive content, and querying a first target video related to the first interactive content from a video resource library based on the first interactive content.

Step 204a2, transmitting the video resource address of the first target video to the target electronic device, so that the target electronic device plays the first target video based on the video resource address of the first target video.

The video resource address may be illustratively a uniform resource locator (Uniform Resource Locator, URL) of the first target video.

Illustratively, the cloud platform may include an internet of things (Internet of Things, IOT) cloud platform and a content cloud platform that directly interface with user devices (e.g., the first electronic device and the second electronic device). The IOT cloud platform receives the first voice interaction information sent by the first electronic device, and after the second electronic device recognizes the real intention of the user, the IOT cloud platform searches the corresponding video resource from the content cloud platform, and after the content cloud platform obtains the corresponding video resource, the IOT cloud platform sends the URL of the video resource to the target electronic device, and the target electronic device plays the video based on the URL address.

The embodiment of the application also provides the following method for realizing the communication among the first electronic device, the second electronic device, the cloud platform and the target electronic device.

And A1, after the intelligent television (namely the target electronic equipment) is opened to be connected with a network, opening an application (namely a target application) of the mobile phone control television, entering an account registration login interface, and completing account registration login by using a mobile phone number and a mobile phone short message verification code.

Step A2: after a television end logs in a target application, the application reads the unique identification information of the equipment of the intelligent television such as the mac of the television and the equipment model, automatically generates the equipment ID (unique equipment identification code) of the intelligent television, and automatically reports the information such as the equipment ID code and account number token of the intelligent television to the background of the target application service (hereinafter referred to as an IOT cloud platform).

Step A3: the cloud platform receives and processes the device information and account information of the intelligent television, reported by the television end, automatically pairs the device ID of the intelligent television and the user account (mobile phone number) to establish a binding relationship, and returns binding success information.

Step A4: the target application at the television end receives the successful binding information of the television returned by the cloud platform, and when the mobile phone number is bound with the television on the television screen, the television is retracted, the target application interface automatically turns off and returns to the television background to operate, and the operating state of the intelligent television device is immediately reported to the cloud platform.

Step A5: and opening the target application at the mobile phone end (namely the first electronic equipment or the second electronic equipment), entering a login interface, and logging in the target application by using the mobile phone number registered and logged in by the television end.

Step A6: after the mobile phone end successfully logs in, the application automatically accesses the cloud platform, pulls the equipment information and the running state information of the smart television with the account number bound, displays the equipment information and the on-line/off-line state information of the smart television on the mobile phone in the form of an equipment card of the smart television, and displays the equipment name and the on-line/off-line state information of the smart television on the card.

Step A7: the cloud platform receives and manages running state information reported by the intelligent television equipment, provides an equipment query data interface for the intelligent television binding equipment for the application end, and supports the application end to query the equipment binding information, the real-time running state information and the like.

Step A8: and after the next time the television is started and networked, the target application automatically logs in and operates in the background of the television.

For example, an association relationship among the first electronic device, the second electronic device, the cloud platform, and the target electronic device may be established based on the above steps.

Optionally, in the embodiment of the present application, the second electronic device is one of a plurality of assistance devices, and the preset trigger condition is used to determine whether the cloud platform can accurately identify the real intention of the elderly user, and if the cloud platform is determined according to the preset trigger condition that the cloud platform cannot accurately identify the real intention of the elderly user, the cloud platform needs to assist other users.

For example, based on at least one of the above preset trigger conditions, it may be determined whether the cloud platform is capable of recognizing the real intention of the elderly user. That is, in the case where the recognition result indicates that the interactive content included in the first voice interaction information cannot be recognized, or in the case where the recognition result indicates that the interactive content included in the first voice interaction information is recognized and the number of times of information satisfies a preset threshold, or in the case where the recognition result indicates that the interactive content included in the first voice interaction information cannot be recognized and the number of times of information satisfies a preset threshold, or only in the case where the number of times of information satisfies a preset threshold, it may be determined that the cloud platform fails to recognize the real intention of the elderly user.

In one possible implementation, the step 202 may include the following step 202a:

step 202a, sending the first assistance information to each electronic device in the plurality of assistance devices when the recognition result indicates that the interactive content contained in the first voice interaction information cannot be recognized.

For example, in consideration of timeliness of feedback of different users, the cloud platform may send assistance information to a plurality of assistance devices, determine an electronic device that is fed back first as the second electronic device, and determine an assistance response fed back by the electronic device that is fed back first as a target assistance response.

In another possible implementation manner, the step 202 may further include any of the following steps 202b1 and 202b2 to 202b 4:

step 202b1, determining the number of times of receiving the voice interaction information with the similarity meeting the preset similarity with the first voice interaction information within the preset time based on the voice recognition result.

Step 202b2, when the recognition result indicates that the interactive content contained in the first voice interactive information is recognized, and the number of times of information meets a preset threshold, sending the first assistance information to each electronic device in the plurality of assistance devices.

Step 202b3, when the recognition result indicates that the interactive content contained in the first voice interaction information cannot be recognized, and the number of times of information meets a preset threshold, sending the first assistance information to each electronic device in the plurality of assistance devices.

Step 202b4, sending the first assistance information to each electronic device in the plurality of assistance devices if the information frequency meets a preset threshold.

It can be understood that when voice interaction information with the same or similar content sent by the first electronic device is received multiple times within a short time, it indicates that a user using the first electronic device may not be satisfied with a video pushed based on the previous voice interaction information, and at this time, no matter how the recognition result is, an assistance request should be sent to the second electronic device that establishes an association relationship with the first electronic device, so that the user requesting the second electronic device can assist in recognizing the interactive content of the interactive voice instruction included in the first voice interaction information.

Optionally, in the embodiment of the present application, when the cloud platform assists other users, it may happen that the assisting person also fails to accurately identify the real intention of the elderly user, and at this time, feedback of other assisting person needs to be referred to.

Specifically, based on the above step 202a or step 202b1 to step 202b4, the above step 203 may further include the following steps 203a1 and 203a2:

step 203a1, receiving assistance responses sent by each electronic device in the plurality of assistance devices based on the recognition result of the interactive content included in the first voice interaction information by the user, and sorting the assistance responses sent by the plurality of assistance devices according to the receiving time.

Step 203a2, determining the assistance response with the earliest receiving time as the target assistance response.

For example, other non-employed assistance responses may be used for further recognition of the interactive voice described above in the event that the recognition result of the interactive content contained in the target assistance response is incorrect.

Illustratively, after the step 204, the video playing method provided in the embodiment of the present application may further include the following steps 205 to 208:

step 205, in a preset time after the second voice interaction information sent by the first electronic device is pushed to the target electronic device, if the second voice interaction information is received, determining whether the interaction content included in the second voice interaction information and the first voice interaction information is consistent based on voice recognition.

Step 206, acquiring an assistance response sent by the third electronic device when the second voice interaction information is consistent with the interaction content contained in the first voice interaction information.

Step 207, sorting the assistance responses sent by each electronic device in the third electronic device based on the assistance weight of each electronic device in the third electronic device.

And step 208, pushing a second target video related to the second interactive content to the target electronic device based on the second interactive content included in the assistance response sent by the target electronic device with the highest assistance weight in the third electronic device.

The third electronic device is other auxiliary devices except the second electronic device in the plurality of auxiliary devices. The assistance response sent by the third electronic device is different from the interactive content contained in the target assistance response.

Illustratively, the assistance weight is used to represent the degree of affinity between the user of each of the plurality of assistance devices and the user of the first electronic device, and the higher the degree of affinity, the more known the assistance weight is, the higher the accuracy of recognition of the interactive voice.

It can be understood that after the first target video is pushed to the target electronic device, if the same interactive content sent by the first electronic device is received in a short time, it can be determined that the interactive content included in the target assistance information fed back by the second electronic device is inaccurate, and at this time, a new interactive content needs to be selected from the assistance information sent by other assistance devices.

It should be noted that, because the cloud platform does not recognize the interactive content included in the second voice interaction information yet, or the recognition result of the interactive content included in the first voice interaction information is not necessarily accurate, at this time, whether the interactive content included in the first voice interaction information and the interactive content included in the second voice interaction information are consistent may be determined only by the voice feature information of the interactive voice included in the first voice interaction information and the voice feature information of the interactive voice included in the second voice interaction information.

Specifically, the above step 205 may further include the following steps 205a1 to 205a3:

step 205a1, obtaining a first interactive voice in the first voice interaction information and a second interactive voice in the second voice interaction information.

Step 205a2, based on the sound feature information of the first interactive voice and the second interactive voice, determining whether the interactive content indicated by the first interactive voice is consistent with the interactive content indicated by the first interactive voice.

Step 205a3, if the first interactive voice is consistent with the interactive content indicated by the first interactive voice, determining that the second voice interactive information is consistent with the interactive content contained in the first voice interactive information, otherwise, determining that the second voice interactive information is inconsistent with the interactive content contained in the first voice interactive information.

The method includes that, based on voice characteristic information of the interactive voice contained in the two voice interactive information, whether interactive contents indicated by the two interactive voices are consistent or not can be judged, if so, real intention of the old user can be determined again, and if not, new interactive voices can be identified according to a normal flow.

The second target video may also include one or more videos, for example.

According to the video playing method provided by the embodiment of the application, firstly, the cloud platform receives the first voice interaction information sent by the first electronic equipment, and recognizes the interaction content of the first voice interaction information to generate a recognition result. And then, the cloud platform sends the first assistance information to the second electronic equipment under the condition that the identification result indicates that the interactive content contained in the first voice interactive information cannot be identified. Then, a target assistance response sent by the second electronic device based on the recognition result of the interactive content contained in the first voice interactive information by the user is received. Finally, pushing the first target video related to the first interactive content to the first electronic device based on the first interactive content contained in the target assistance response. Therefore, the accuracy of the voice instruction recognition of the old can be improved to a certain extent, and the use requirement of the old on the intelligent television is further met.

It should be noted that, in the video playing method provided by the embodiment of the present application, the execution body may be a video playing device, or a control module in the video playing device for executing the video playing method. In the embodiment of the present application, a video playing method performed by a video playing device is taken as an example, and the video playing device provided by the embodiment of the present application is described.

In the embodiment of the present application, the method is shown in the drawings. The video playing method is exemplified by a figure in combination with the embodiment of the application. In specific implementation, the video playing method shown in the foregoing method drawings may also be implemented in combination with any other drawing that may be combined and illustrated in the foregoing embodiments, which is not repeated herein.

The video playing device provided by the application is described below, and the video playing method described below and the video playing method described above can be referred to correspondingly.

Fig. 4 is a schematic structural diagram of a video playing device according to an embodiment of the present application, as shown in fig. 4, including:

a receiving module 401, configured to receive first voice interaction information sent by a first electronic device; the content recognition module 402 is configured to recognize the interactive content of the first voice interaction information, and generate a recognition result; an assistance request module 403, configured to send first assistance information to the second electronic device when a preset trigger condition is met; the first assistance information is used for indicating a user of the second electronic equipment to identify interactive contents contained in the first voice interactive information; an assistance response module 404, configured to receive the target assistance response sent by the second electronic device; the target assistance response is sent by the second electronic equipment based on the recognition result of the user on the interactive content contained in the first voice interactive information; a content pushing module 405, configured to push, to a target electronic device, a first target video related to the first interactive content based on the first interactive content included in the target assistance response; wherein the preset trigger condition includes any one of the following: the recognition result indicates that the interactive content contained in the first voice interaction information cannot be recognized, the recognition result indicates that the interactive content contained in the first voice interaction information is recognized, and voice interaction information similar to the first voice interaction information in the preset times is received within the preset time.

Optionally, the apparatus further comprises: an acquisition module; the acquisition module is used for acquiring the first interactive content and inquiring a first target video related to the first interactive content from a video resource library based on the first interactive content; the content pushing module 405 is specifically configured to send the video resource address of the first target video to the target electronic device, so that the target electronic device plays the first target video based on the video resource address of the first target video.

Optionally, the second electronic device is one of a plurality of assisting devices; the assistance request module 403 is specifically configured to send the first assistance information to each electronic device in the plurality of assistance devices if the recognition result indicates that the interactive content included in the first voice interaction information cannot be recognized.

Optionally, the assistance response module 404 is specifically configured to receive assistance responses sent by each of the plurality of assistance devices based on the recognition result of the user on the interactive content included in the first voice interaction information, and order the assistance responses sent by the plurality of assistance devices according to the receiving time; the assistance response module 404 is specifically further configured to determine the assistance response with the earliest receiving time as the target assistance response.

Optionally, the apparatus further comprises: a sequencing module; the determining module is further configured to determine, if second voice interaction information sent by the first electronic device is received within a preset time after the second voice interaction information is pushed to the target electronic device and the first target video, whether interaction content included in the second voice interaction information and the first voice interaction information is consistent based on voice recognition; the acquisition module is further configured to acquire an assistance response sent by a third electronic device when the second voice interaction information is consistent with the interaction content included in the first voice interaction information; the ordering module is used for ordering the assistance response sent by each electronic device in the third electronic device based on the assistance weight of each electronic device in the third electronic device; the content pushing module 405 is further configured to push, to a target electronic device, a second target video related to the second interactive content based on the second interactive content included in the assistance response sent by the target electronic device with the highest assistance weight in the third electronic device; the third electronic device is other auxiliary devices except the second electronic device in the plurality of auxiliary devices; and the assistance response sent by the third electronic equipment is different from the interactive content contained in the target assistance response.

According to the video playing device, firstly, the cloud platform receives the first voice interaction information sent by the first electronic equipment, and recognizes the interaction content of the first voice interaction information to generate a recognition result. And then, the cloud platform sends the first assistance information to the second electronic equipment under the condition that the preset triggering condition is met. Then, a target assistance response sent by the second electronic device based on the recognition result of the interactive content contained in the first voice interactive information by the user is received. Finally, pushing the first target video related to the first interactive content to the first electronic device based on the first interactive content contained in the target assistance response. Therefore, the accuracy of the voice instruction recognition of the old can be improved to a certain extent, and the use requirement of the old on the intelligent television is further met.

Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, the electronic device may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a video playback method comprising: receiving first voice interaction information sent by first electronic equipment, and identifying interaction content of the first voice interaction information to generate an identification result; under the condition that a preset triggering condition is met, first assistance information is sent to the second electronic equipment; the first assistance information is used for indicating a user of the second electronic equipment to identify interactive contents contained in the first voice interactive information; receiving a target assistance response sent by the second electronic equipment based on the recognition result of the interactive content contained in the first voice interaction information by the user; pushing a first target video related to the first interactive content to target electronic equipment based on the first interactive content contained in the target assistance response; wherein the preset trigger condition includes any one of the following: the recognition result indicates that the interactive content contained in the first voice interaction information cannot be recognized, the recognition result indicates that the interactive content contained in the first voice interaction information is recognized, and voice interaction information similar to the first voice interaction information in the preset times is received within the preset time.

Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present application also provides a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the video playback method provided by the above methods, the method comprising: receiving first voice interaction information sent by first electronic equipment, and identifying interaction content of the first voice interaction information to generate an identification result; under the condition that a preset triggering condition is met, first assistance information is sent to the second electronic equipment; the first assistance information is used for indicating a user of the second electronic equipment to identify interactive contents contained in the first voice interactive information; receiving a target assistance response sent by the second electronic equipment based on the recognition result of the interactive content contained in the first voice interaction information by the user; pushing a first target video related to the first interactive content to target electronic equipment based on the first interactive content contained in the target assistance response; wherein the preset trigger condition includes any one of the following: the recognition result indicates that the interactive content contained in the first voice interaction information cannot be recognized, the recognition result indicates that the interactive content contained in the first voice interaction information is recognized, and voice interaction information similar to the first voice interaction information in the preset times is received within the preset time.

In still another aspect, the present application further provides a computer readable storage medium, where the computer readable storage medium includes a stored program, where the program executes a video playing method provided by the above methods, and the method includes: receiving first voice interaction information sent by first electronic equipment, and identifying interaction content of the first voice interaction information to generate an identification result; under the condition that a preset triggering condition is met, first assistance information is sent to the second electronic equipment; the first assistance information is used for indicating a user of the second electronic equipment to identify interactive contents contained in the first voice interactive information; receiving a target assistance response sent by the second electronic equipment based on the recognition result of the interactive content contained in the first voice interaction information by the user; pushing a first target video related to the first interactive content to target electronic equipment based on the first interactive content contained in the target assistance response; wherein the preset trigger condition includes any one of the following: the recognition result indicates that the interactive content contained in the first voice interaction information cannot be recognized, the recognition result indicates that the interactive content contained in the first voice interaction information is recognized, and voice interaction information similar to the first voice interaction information in the preset times is received within the preset time.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A video playing method, the method comprising:

receiving first voice interaction information sent by first electronic equipment, and identifying interaction content of the first voice interaction information to generate an identification result;

under the condition that a preset triggering condition is met, first assistance information is sent to the second electronic equipment; the first assistance information is used for indicating a user of the second electronic equipment to identify interactive contents contained in the first voice interactive information;

receiving a target assistance response sent by the second electronic equipment; the target assistance response is sent by the second electronic equipment based on the recognition result of the user on the interactive content contained in the first voice interactive information;

Pushing a first target video related to the first interactive content to target electronic equipment based on the first interactive content contained in the target assistance response;

wherein the preset trigger condition includes any one of the following: the recognition result indicates that the interactive content contained in the first voice interaction information cannot be recognized, the recognition result indicates that the interactive content contained in the first voice interaction information is recognized, and voice interaction information similar to the first voice interaction information in the preset times is received within the preset time.

2. The video playback method of claim 1, wherein the second electronic device is one of a plurality of auxiliary devices;

under the condition that the preset triggering condition is met, sending first assistance information to the second electronic equipment, wherein the first assistance information comprises:

and sending the first assistance information to each electronic device in the plurality of assistance devices under the condition that the identification result indicates that the interactive content contained in the first voice interaction information cannot be identified.

3. The video playback method of claim 1, wherein the second electronic device is one of a plurality of auxiliary devices;

determining the number of times of receiving the voice interaction information with the similarity meeting the preset similarity with the first voice interaction information in the preset time based on the voice recognition result;

transmitting the first assistance information to each electronic device in the plurality of assistance devices when the recognition result indicates that the interaction content contained in the first voice interaction information is recognized and the information times meet a preset threshold;

or alternatively, the process may be performed,

and sending the first assistance information to each electronic device in the plurality of assistance devices under the condition that the identification result indicates that the interactive content contained in the first voice interaction information cannot be identified and the information frequency meets a preset threshold value.

4. A method of playing video according to claim 2 or 3, wherein receiving a target-assistance response sent by the second electronic device includes:

receiving assistance responses sent by each electronic device in the plurality of assistance devices based on the recognition results of the interaction content contained in the first voice interaction information by the user, and sequencing the assistance responses sent by the plurality of assistance devices according to the receiving time;

And determining the assistance response with the earliest receiving time as the target assistance response.

5. The method of claim 4, wherein after pushing the first target video related to the first interactive content to the target electronic device based on the first interactive content included in the target assistance response, the method further comprises:

if second voice interaction information sent by the first electronic device is received in a preset time after the second voice interaction information is pushed to the target electronic device and the first target video is pushed to the target electronic device, judging whether the second voice interaction information is consistent with interaction content contained in the first voice interaction information or not based on voice recognition;

acquiring an assistance response sent by a third electronic device under the condition that the second voice interaction information is consistent with the interaction content contained in the first voice interaction information;

ordering the assistance responses sent by each electronic device in the third electronic device based on the assistance weight of each electronic device in the third electronic device;

pushing a second target video related to the second interactive content to the target electronic equipment based on the second interactive content contained in the assistance response sent by the target electronic equipment with highest assistance weight in the third electronic equipment;

The third electronic device is other auxiliary devices except the second electronic device in the plurality of auxiliary devices; and the assistance response sent by the third electronic equipment is different from the interactive content contained in the target assistance response.

6. The method according to claim 5, wherein the determining whether the interactive contents contained in the second voice interactive information and the first voice interactive information are consistent based on the voice recognition includes:

acquiring first interactive voice in the first voice interaction information and second interactive voice in the second voice interaction information;

judging whether the interactive content indicated by the first interactive voice is consistent with the interactive content indicated by the first interactive voice or not based on the sound characteristic information of the first interactive voice and the second interactive voice;

if the first interactive voice is consistent with the interactive content indicated by the first interactive voice, determining that the second voice interactive information is consistent with the interactive content contained in the first voice interactive information, otherwise, determining that the second voice interactive information is inconsistent with the interactive content contained in the first voice interactive information.

7. The method of claim 1, wherein pushing the first target video related to the first interactive content to the target electronic device based on the first interactive content included in the target assistance response comprises:

acquiring the first interactive content, and inquiring a first target video related to the first interactive content from a video resource library based on the first interactive content;

and sending the video resource address of the first target video to the target electronic equipment, so that the target electronic equipment plays the first target video based on the video resource address of the first target video.

8. A video playback device, the device comprising:

the receiving module is used for receiving first voice interaction information sent by the first electronic equipment;

the content recognition module is used for recognizing the interactive content of the first voice interactive information and generating a recognition result;

the assistance request module is used for sending first assistance information to the second electronic equipment under the condition that a preset triggering condition is met; the first assistance information is used for indicating a user of the second electronic equipment to identify interactive contents contained in the first voice interactive information;

The assistance response module is used for receiving a target assistance response sent by the second electronic equipment;

the content pushing module is used for pushing a first target video related to the first interactive content to target electronic equipment based on the first interactive content contained in the target assistance response;

9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program when run performs the video playback method of any one of claims 1 to 7.

10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to perform the video playback method of any one of claims 1 to 7 by means of the computer program.