CN110830837A

CN110830837A - Video playing method, computer storage medium, player and server

Info

Publication number: CN110830837A
Application number: CN201810890020.9A
Authority: CN
Inventors: 刘雷晴
Original assignee: Beijing Youku Technology Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2020-02-21

Abstract

The embodiment of the application discloses a video playing method, a computer storage medium, a player and a server, wherein the method is provided with a first reference voiceprint feature library and a played target video; the method comprises the following steps: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video. According to the technical scheme, other people who enter the private space can be effectively prevented from hearing the played video.

Description

Video playing method, computer storage medium, player and server

Technical Field

The present application relates to the field of internet technologies, and in particular, to a video playing method, a computer storage medium, a player, and a server.

Background

In daily life, when watching a video being played in a private space such as a bedroom or a study, people usually do not want others entering the private space to hear the played video.

At present, the user can only stop playing the video manually when perceiving that other people enter the private space, so as to avoid the other people from hearing the video being played. However, when the user carefully watches the video, it is often not easy to perceive that other people enter the private space of the user, and at this time, it is difficult to avoid that other people hear the played video.

Disclosure of Invention

An object of the present invention is to provide a method, a computer storage medium, a player and a server for playing a video, which can effectively prevent other people entering a private space from hearing the played video.

In order to achieve the above object, the present application provides a method for playing a video, where the method provides a first reference voiceprint feature library and a played target video; the method comprises the following steps: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video.

In order to achieve the above object, the present application further provides a computer storage medium for storing a first reference voiceprint feature library and a played target video, and a computer program; the computer program, when executed by a processor, performs the steps of: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video.

In order to achieve the above object, the present application further provides a player, where the player includes a processor and a computer storage medium, and at least two acoustic wave sensors are disposed on the player; the at least two acoustic wave sensors are used for receiving sound signals sent by a sound source.

In order to achieve the above object, an embodiment of the present application further provides a method for playing a video, where the method provides a first reference voiceprint feature library and a target video played by a client; the method comprises the following steps: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, feeding back specified information representing that the target video is stopped to be played to the client.

To achieve the above object, the present application further provides a server, which includes a memory and a processor, wherein the memory stores a first reference voiceprint feature library and a target video played by a client, and a computer program, when the computer program is executed by the processor, the server implements the following steps: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, feeding back specified information representing that the target video is stopped to be played to the client.

In order to achieve the above object, the present application further provides a method for playing a video, where the method is provided with a second reference voiceprint feature library and a played target video; the method comprises the following steps: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, stopping playing the target video.

In order to achieve the above object, the present application further provides a computer storage medium for storing a second reference voiceprint feature library and a played target video, and a computer program; the computer program, when executed by a processor, performs the steps of: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, stopping playing the target video.

In order to achieve the above object, an embodiment of the present application further provides a method for playing a video, where the method is provided with a second reference voiceprint feature library and a target video played by a client; the method comprises the following steps: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, feeding back specified information representing that the target video is stopped to be played to the client.

To achieve the above object, the present application provides a server, which includes a memory and a processor, the memory stores a second reference voiceprint feature library and a target video played by a client, and a computer program, when executed by the processor, the computer program implements the following steps: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, feeding back specified information representing that the target video is stopped to be played to the client.

As can be seen from the above, in the present application, the first reference voiceprint feature library includes at least one reference voiceprint feature, and the reference voiceprint feature may be, for example, a voiceprint feature of the user. When a user watches a target video which is being played in a private space of the user, the sound signal sent by the sound source can be continuously received through the client, for example, if other people enter the private space, the sound signal sent by the other people can be received through the client, and at least one voiceprint characteristic included in the sound signal is identified. Then, the client may further determine whether the at least one voiceprint feature includes a reference voiceprint feature in the first reference voiceprint feature library, for example, a user's own voiceprint feature; and if not, stopping playing the target video. Therefore, although other people enter the private space of the user when the user carefully watches the video, the user can immediately find the video and stop playing the video, so that the situation that other people entering the private space hear the played video can be effectively avoided.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of an embodiment of a method for video playback in an embodiment of the present application;

FIG. 2 is a diagram illustrating the results of a player in an embodiment of the present application;

fig. 3 is a flowchart of another example of a method for playing a video in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 5 is a flowchart of another example of a method for playing a video in an embodiment of the present application;

fig. 6 is a flowchart of another example of a method for playing a video in this embodiment.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.

The embodiment of the application provides a video playing method, which can be applied to system architectures of a client and a server. The server may be a device that stores video data. In particular, the server may be a background business server of a website capable of providing video services. The website can be, for example, an arcade art, a fox search video, an Tencent video, an Acfun, and the like. In this embodiment, the server may be an electronic device having data operation, storage function and network interaction function; software may also be provided that runs in the electronic device to support data processing, storage, and network interaction. The number of servers is not particularly limited in the present embodiment. The server may be one server, several servers, or a server cluster formed by several servers.

In this embodiment, the client may be an electronic device for rendering video data and may capture sound signals. Specifically, the client may be, for example, a desktop computer, a tablet computer, a notebook computer, a smart phone, a digital assistant, a smart wearable device, a television with a network access function, and the like, which have a sound signal capturing function. Alternatively, the client may be software capable of running in the electronic device. Specifically, the client may be a browser in the electronic device, and the browser may load an access portal provided by the video website platform. The video website platform may be, for example, an arcade, a fox video, an Acfun, or the like, and the access portal may be a home page of the website platform. The client can also be an application which is provided by a video website platform and runs in the intelligent terminal.

The embodiment of the application provides a video playing method, which can be applied to the client. The method may be provided with a first library of reference voiceprint features and a target video for playback.

In this embodiment, the server may be provided with a first reference voiceprint feature library. The first library of reference voiceprint features can be a data set storing voiceprint features. The first reference voiceprint feature library can adopt any one of database formats such as MySQL, Oracle, DB2 and Sybase. The first library of reference voiceprint features can be deployed on a storage medium in a server. Furthermore, the client may download the first reference voiceprint feature library from the server, and store the downloaded first reference voiceprint feature library in a memory, so as to perform subsequent voiceprint feature comparison. The storage may be a memory or a cache.

In this embodiment, the first reference voiceprint feature library may include at least one reference voiceprint feature. The reference voiceprint features included in the first reference voiceprint feature library may be, for example, the voiceprint features of the user, or other voiceprint features set by the user. Other voiceprint features can be selected according to the user's own wishes, for example, voiceprint features of family members of the user, or voiceprint features of close friends of the user, and the like.

In this embodiment, the target video played on the client may be a video currently viewed by the user. The video currently watched by the user may be played after the client renders the video data representing the target video. Wherein the video data can be downloaded by the client from a video database stored in the server. The video database may be a data set storing video data. The video database may be in any one of MySQL, Oracle, DB2, Sybase, etc. database formats.

Referring to fig. 1, the method for playing the video may include the following steps.

S11: a sound signal emitted by a sound source is received and at least one voiceprint feature included in the sound signal is identified.

In the present embodiment, the sound source may be a person who emits sound. Specifically, the sound source may be a person who enters or is near a user's private space, or the like. The private space of the user may be, for example, the user's bedroom or study, etc. In practical situations, when a user watches a video played by the client in his or her bedroom or study, a person entering the user's bedroom or study, or a person near the user's bedroom or study, may generate different sound signals due to speaking. However, these people may hear the video played by the client due to being within or near the user's private space. Thus, it is necessary to capture the sound signals emitted by these sound sources through the client and perform subsequent processing to effectively control the playing of the video, so as to prevent these people from hearing the video watched by the user.

In this embodiment, the sound source may be one person or a plurality of persons, the sounds of different persons having different voiceprint characteristics. Then, the sound signal may include a sound signal emitted by at least one person. Thus, the sound signal may include at least one voiceprint feature therein.

In this embodiment, the client may receive a sound signal emitted by a sound source. Specifically, for example, a sound signal emitted from the sound source may be received by a microphone mounted on the client. Wherein, the number of the microphones loaded on the client can be one or more. In practical applications, microphones may be respectively disposed in different directions on the client, so that the client can receive sound signals from different directions as accurately as possible.

In this embodiment, after receiving a sound signal emitted by a sound source, the client may also identify at least one voiceprint feature included in the sound signal. Specifically, for example, after receiving the sound signal emitted by the sound source, the client may convert the sound signal from the time domain to the frequency domain to obtain the sound signal in the frequency domain, and may identify the voiceprint feature of the sound signal with the signal intensity greater than the specified intensity from the sound signal in the frequency domain.

S13: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video.

In this embodiment, after identifying at least one voiceprint feature included in the voice signal, the client may determine whether a reference voiceprint feature in the first reference voiceprint feature library is included in the at least one voiceprint feature. Specifically, the client may compare the identified voiceprint features with the reference voiceprint features in the first reference voiceprint feature library one by one, and determine whether each identified voiceprint feature matches the reference voiceprint features. If the judgment results are not matched, the at least one voiceprint feature does not contain the reference voiceprint feature in the first reference feature library; and if the judgment result is that the recognized voiceprint features are matched with the reference voiceprint features in the first reference voiceprint feature library, the at least one voiceprint feature comprises the reference voiceprint features in the first reference feature library. For example, the client identifies a plurality of voiceprint features, and indicates that at least one of the voiceprint features includes a reference voiceprint feature in the first reference feature library when the determination result indicates that at least one of the voiceprint features matches a reference voiceprint feature in the first reference feature library. In this embodiment, the determining, by the client, whether the identified voiceprint feature matches the reference voiceprint feature may specifically include calculating, by the client, a matching degree between the identified voiceprint feature and the reference voiceprint feature, and when the matching degree is greater than or equal to a specified matching degree threshold, determining that the identified voiceprint feature matches the reference voiceprint feature; when the degree of match is less than a specified degree of match threshold, it may be determined that the identified voiceprint feature does not match the reference voiceprint feature. The value range of the specified matching degree threshold may be specifically 80 to 100 percent.

In this embodiment, when the determination result is that the target video is not included, the client may stop playing the target video. Therefore, although the user carefully watches the video, when other people enter the private space of the user, once the client judges that the voiceprint features in the sound signals sent by the people do not contain the reference voiceprint features, the video playing can be automatically stopped, and the situation that the other people entering the private space hear the played video can be effectively avoided.

In a specific application scenario, the client may be a tablet computer. The user may use his or her voiceprint characteristics as reference voiceprint characteristics in the first reference voiceprint characteristics library. The user is watching the video played by the tablet computer in his or her own bedroom. At this time, other people enter the user's bedroom while speaking, and the tablet computer can receive the voice signals sent by the people and can recognize various voiceprint characteristics included in the voice signals. After identifying the voiceprint features, the tablet computer can determine whether the voiceprint features contain the user's own voiceprint features. Because the voiceprint features of other people are not matched with the voiceprint features of the user, the tablet personal computer can be judged to be not contained, and the video watched by the user is directly stopped playing. Therefore, although the user carefully watches the video, when other people enter the private space of the user, once the client judges that the voiceprint features in the sound signals sent by the people do not contain the reference voiceprint features, the video playing can be automatically stopped, and the situation that the other people entering the private space hear the played video can be effectively avoided.

In one embodiment of the present application, it is often difficult for people far away from the private space of the user to hear or hear the video played by the client that the user watches in the private space of the user, however, sound signals emitted by the people may still be received by the client and cause the video to stop playing. To avoid this, the client may also use the volume of the sound signal as a factor in controlling the video playback. After determining whether the at least one voiceprint feature comprises a reference voiceprint feature in the first library of reference voiceprint features, the method may further include: if the result of the determination is that the sub-sound signal is not included, the client may split the sound signal into at least one path of sub-sound signal based on the recognized voiceprint feature, and determine whether the volume of the sub-sound signal with the largest volume is greater than or equal to a specified volume threshold; if the target video is larger than or equal to the target video, stopping playing the target video, and if the target video is smaller than the target video, continuing playing the target video. For example, some people chat at places far from the private space of the user, and after determining that the plurality of voiceprint features in the sound signals emitted by these people do not include the reference voiceprint feature, the client may split the sound signal into multiple sub-sound signals based on the plurality of recognized voiceprint features, and determine whether the volume of the sub-sound signal with the largest volume is greater than or equal to a specified volume threshold. Since these people are far away from the user's private space, the volume of the sound signal that the client typically receives is less than the specified volume threshold, and the target video can continue to be played. Therefore, the situation that the video stops playing due to the fact that people far away from the private space of the user chat can be avoided. The value range of the designated volume threshold value can be 15-25 decibels. The specified volume threshold may be set according to actual application, and is not limited herein.

In this embodiment, the client may split the sound signal into at least one path of sub-sound signals based on the identified voiceprint features. Specifically, the client may convert the sound signal from a time domain to a frequency domain to obtain a sound signal in the frequency domain, and then may identify a sub-sound signal in the frequency domain that matches the voiceprint feature from the sound signal in the frequency domain and convert the sub-sound signal in the frequency domain to a sub-sound signal in the time domain. In this way, the sound signal containing at least one voiceprint feature can be split into at least one sub-sound signal.

In an embodiment of the present application, in a practical application, a person entering the private space of the user may leave quickly, or may chat with the user for a short time in the private space of the user, so that it is required that the client continues to receive the sound signal emitted by the sound source after stopping playing the target video, and further controls video playing according to specific situations, instead of stopping playing the video all the time. Specifically, after stopping playing the target video, the method further includes: the client can continue to receive a new sound signal emitted by the sound source and identify at least one new voiceprint feature included in the new sound signal again; whether the at least one new voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library can be judged; if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video, and if the volume is smaller than the designated volume threshold, playing the target video. For example, after stopping playing the target video, the client may continue to receive a new sound signal sent by another person, and identify a plurality of new voiceprint features included in the new sound signal again, and when the determination result is still that the new voiceprint features are not included, may split the new sound signal into multiple paths of sub-sound signals based on the identified new voiceprint features, and determine whether the volume of the sub-sound signal with the largest volume is greater than or equal to a specified volume threshold. At this time, when the person entering the private space of the user chats with the user for a short time, the client side stops playing the target video continuously if the judgment result is greater than or equal to the specified volume threshold, and when the person entering the private space of the user may leave rapidly, the client side can continue playing the target video if the judgment result is less than the specified volume threshold.

In an embodiment of the present application, in an actual application, after the target video is stopped playing, a person who usually enters the private space of the user does not leave immediately or a person who chats near the private space of the user does not end immediately, and the person often stays for a certain period of time. Specifically, after stopping playing the target video, the method further includes: the client can stop receiving the sound signal sent by the sound source, wait for a specified time, receive a new sound signal sent by the sound source again, and identify at least one new voiceprint feature included in the new sound signal; whether the at least one new voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library can be judged; if not, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video. Wherein the designated time period can be 1-5 minutes. The specified time period may be set according to the actual application, and is not limited herein.

In this embodiment, the functions implemented in the above method steps may be implemented by a computer program, and the computer program may be stored in a computer storage medium. In particular, the computer storage medium may be coupled to a processor, which may thereby read the computer program from the computer storage medium. The computer storage medium may be configured to store a first library of reference voiceprint features and a played target video. The computer program, when executed by a processor, may implement the steps of:

s11: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal;

In one embodiment, the computer program, when executed by a processor, further performs the steps of:

if not, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the target video is larger than or equal to the target video, stopping playing the target video.

continuing to receive a new sound signal emitted by the sound source and re-identifying at least one new voiceprint feature included in the new sound signal;

judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the first reference voiceprint feature library;

if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video.

It should be noted that, the functions that can be realized by the computer program in the computer storage medium can all refer to the foregoing method implementation embodiments, and the technical effects achieved are also similar to the technical effects achieved in the foregoing method implementation embodiments, and are not described here again.

Referring to fig. 2, the present application further provides a player. The player may be the client described above. The player comprises a processor and the computer storage medium. The computer storage medium may be coupled to a processor, which may thereby read the computer program from the computer storage medium. In addition, at least two acoustic wave sensors are arranged on the player. The acoustic wave sensor may be a sensor capable of detecting an acoustic wave. The sound waves may have different frequency ranges, wherein the sound waves may be sounds that can be perceived by a person. Thus, the acoustic wave sensor may be an acoustic sensor. The sound sensor may be, for example, various types of microphones. The at least two acoustic wave sensors may be configured to receive acoustic signals from an acoustic source. Taking two acoustic wave sensors as an example, the two acoustic wave sensors can be disposed at different positions on the player, and both acoustic wave sensors can receive the sound signal from the sound source.

In this embodiment, the computer storage medium may include a physical device for storing information, and is generally a medium that digitizes information and stores the information in an electronic, magnetic, or optical manner. The computer storage medium according to this embodiment may further include: devices that store information using electrical energy, such as RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, usb disks; devices for storing information optically, such as CDs or DVDs. Of course, there are other forms of computer storage media, such as quantum memory, graphene memory, and the like.

In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.

The specific functions implemented by the computer storage medium and the player provided in the embodiments of the present description may be explained with reference to the foregoing embodiments in the present description, and can achieve the technical effects of the foregoing embodiments, and thus, will not be described herein again.

In an embodiment of the present application, the execution main body of the method may be further split into the client and the server, so as to reduce the operation pressure of the original independent execution main body client, improve the operation efficiency, and reduce the manufacturing cost of the client. In this way, the present application may also provide a method for playing a video, where an execution subject of the method is the server described above. The method is provided with a first reference voiceprint feature library and a target video played by a client. Referring to fig. 3, the method includes the following steps.

S21: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal.

In this embodiment, the server may receive a voice signal from a client and identify at least one voiceprint feature included in the voice signal. Wherein the sound signal is received by the client after being emitted by a sound source. Specifically, the implementation process of the client receiving the sound signal emitted by the sound source and the implementation process of the server identifying at least one voiceprint feature included in the sound signal may refer to the corresponding implementation steps in step S11.

S23: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, feeding back specified information representing that the target video is stopped to be played to the client.

In this embodiment, the server may determine whether the at least one voiceprint feature includes a reference voiceprint feature in the first reference voiceprint feature library, and if not, feed back designation information representing that the target video is stopped from being played to the client, so that the client stops playing the target video after receiving the designation information. Specifically, the server determines whether the at least one voiceprint feature includes a specific implementation procedure of the reference voiceprint feature in the first reference voiceprint feature library, and may refer to a corresponding implementation step in step S13.

In an embodiment of the present application, after the server determines whether the at least one voiceprint feature includes a reference voiceprint feature in the first reference voiceprint feature library, the method may further include: if not, the server splits the sound signal into at least one path of sub sound signal based on the identified voiceprint features, and judges whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value; and if the video is larger than or equal to the target video, feeding back specified information representing that the target video is stopped playing to the client.

In an embodiment of the present application, after feeding back, to the client, the specific information characterizing that the target video stops playing, the method may further include: the server can continue to receive new voice signals sent by the client and identify at least one new voiceprint feature included in the new voice signals again; judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the first reference voiceprint feature library; if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the specified volume threshold, continuing to feed back specified information representing that the target video is stopped to be played to the client.

Referring to fig. 4, the present application further provides a server, which includes a memory and a processor, wherein the memory stores a first reference voiceprint feature library and a target video played by a client, and a computer program, when the computer program is executed by the processor, the server implements the following steps:

s21: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal;

In this embodiment, the memory may include a physical device for storing information, and typically, the information is digitized and then stored in a medium using an electrical, magnetic, or optical method. The memory according to this embodiment may further include: devices that store information using electrical energy, such as RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, usb disks; devices for storing information optically, such as CDs or DVDs. Of course, there are other ways of memory, such as quantum memory, graphene memory, and so forth.

In one embodiment, the computer program, when executed by the processor, further implements the steps of: if not, the server splits the sound signal into at least one path of sub sound signal based on the identified voiceprint features, and judges whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value; and if the video is larger than or equal to the target video, feeding back specified information representing that the target video is stopped playing to the client.

In one embodiment, the computer program, when executed by the processor, further implements the steps of:

continuously receiving a new sound signal sent by the client, and identifying at least one new voiceprint feature included in the new sound signal again;

if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the specified volume threshold, continuing to feed back specified information representing that the target video is stopped to be played to the client.

The specific functions implemented by the memory and the processor of the server provided in the embodiments of the present specification may be explained in comparison with the foregoing embodiments in the present specification, and can achieve the technical effects of the foregoing embodiments, and thus, no further description is provided herein.

The application also provides a video playing method. The method can be applied to the client. The method is provided with a second reference voiceprint feature library and a played target video.

In this embodiment, a second reference voiceprint feature library may be provided in the server. The second library of reference voiceprint features can be a data set storing voiceprint features. The second reference voiceprint feature library can adopt any one of database formats such as MySQL, Oracle, DB2 and Sybase. The second library of reference voiceprint features can be deployed on a storage medium in a server. Furthermore, the client may download the second reference voiceprint feature library from the server, and store the downloaded second reference voiceprint feature library in a memory, so as to perform subsequent voiceprint feature comparison. The storage may be a memory or a cache.

In this embodiment, the second reference voiceprint feature library may include at least one reference voiceprint feature. The reference voiceprint features included in the second reference voiceprint feature library may be, for example, voiceprint features of other people who can enter the private space of the user, or voiceprint features of certain people set by the user. The voiceprint characteristics of some persons can be selected specifically according to the user's own wishes.

Referring to fig. 5, the method for playing the video may include the following steps.

S31: a sound signal emitted by a sound source is received and at least one voiceprint feature included in the sound signal is identified.

In this embodiment, the client may receive a sound signal emitted by a sound source and identify at least one voiceprint feature included in the sound signal. The specific implementation process of this step is similar to step S11, and reference may be made to the implementation step corresponding to step S11.

S33: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, stopping playing the target video.

In this embodiment, the client may determine whether the at least one voiceprint feature includes a reference voiceprint feature in the second reference voiceprint feature library, and if so, stop playing the target video. The specific implementation process of the client determining whether the at least one voiceprint feature includes the reference voiceprint feature in the second reference voiceprint feature library is similar to the specific implementation process of the step S13 of determining whether the at least one voiceprint feature includes the reference voiceprint feature in the first reference voiceprint feature library, and reference may be made to the implementation step corresponding to step 13.

In an embodiment of the present application, after determining whether the at least one voiceprint feature includes a reference voiceprint feature in the second library of reference voiceprint features, the method may further include: if yes, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the target video is larger than or equal to the target video, stopping playing the target video.

In an embodiment of the present application, after stopping playing the target video, the method may further include: the client can continue to receive a new sound signal emitted by the sound source and identify at least one new voiceprint feature included in the new sound signal again; judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the second reference voiceprint feature library; if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video.

In this embodiment, the functions implemented in the above method steps may be implemented by a computer program, and the computer program may be stored in a computer storage medium. In particular, the computer storage medium may be coupled to a processor, which may thereby read the computer program from the computer storage medium. The computer storage medium may be configured to store a second library of reference voiceprint features and the played target video. The computer program, when executed by a processor, may implement the steps of:

s31: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal;

if yes, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the target video is larger than or equal to the target video, stopping playing the target video.

judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the second reference voiceprint feature library;

if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video.

The application also provides a player. The player may be the client described above. The player comprises a processor and the computer storage medium. The computer storage medium may be coupled to a processor, which may thereby read the computer program from the computer storage medium. In addition, at least two acoustic wave sensors are arranged on the player. The acoustic wave sensor may be a sensor capable of detecting an acoustic wave. The sound waves may have different frequency ranges, wherein the sound waves may be sounds that can be perceived by a person. Thus, the acoustic wave sensor may be an acoustic sensor. The sound sensor may be, for example, various types of microphones. The at least two acoustic wave sensors may be configured to receive acoustic signals from an acoustic source. Taking two acoustic wave sensors as an example, the two acoustic wave sensors can be disposed at different positions on the player, and both acoustic wave sensors can receive the sound signal from the sound source.

In an embodiment of the present application, the execution main body of the method may be further split into the client and the server, so as to reduce the operation pressure of the original independent execution main body client, improve the operation efficiency, and reduce the manufacturing cost of the client. In this way, the present application may also provide a method for playing a video, where an execution subject of the method is the server described above. The method is provided with a second reference voiceprint feature library and a target video played by the client. Referring to fig. 6, the method includes the following steps.

S41: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal.

In this embodiment, the server may receive a voice signal from a client and identify at least one voiceprint feature included in the voice signal. Wherein the sound signal is received by the client after being emitted by a sound source. Specifically, the implementation process of the client receiving the sound signal emitted by the sound source and the implementation process of the server identifying at least one voiceprint feature included in the sound signal may refer to the corresponding implementation steps in step S31.

S43: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, feeding back specified information representing that the target video is stopped to be played to the client.

In this embodiment, the server may determine whether the at least one voiceprint feature includes a reference voiceprint feature in the second reference voiceprint feature library, and if so, feed back designation information representing that the target video is stopped being played to the client, so that the client stops playing the target video after receiving the designation information. Specifically, the server determines whether the at least one voiceprint feature includes a specific implementation procedure of the reference voiceprint feature in the second reference voiceprint feature library, and may refer to a corresponding implementation step in step S33.

In an embodiment of the present application, after the server determines whether the at least one voiceprint feature includes a reference voiceprint feature in the second reference voiceprint feature library, the method may further include: if yes, the server can split the sound signal into at least one path of sub sound signal based on the identified voiceprint feature, and judge whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value; and if the video is larger than or equal to the target video, feeding back specified information representing that the target video is stopped playing to the client.

In an embodiment of the present application, after feeding back, to the client, the specific information characterizing that the target video stops playing, the method may further include: the server can continue to receive new voice signals sent by the client and identify at least one new voiceprint feature included in the new voice signals again; whether the at least one new voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library can be judged; if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the specified volume threshold, continuing to feed back specified information representing that the target video is stopped to be played to the client.

The present application further provides a server, which includes a memory and a processor, wherein the memory stores a second reference voiceprint feature library and a target video played by a client, and a computer program, and when the computer program is executed by the processor, the following steps are implemented:

if yes, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the video is larger than or equal to the target video, feeding back specified information representing that the target video is stopped playing to the client.

continuously receiving a new sound signal sent by the client, and identifying at least one new voiceprint feature included in the new sound signal again; whether the at least one new voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library can be judged; if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the specified volume threshold, continuing to feed back specified information representing that the target video is stopped to be played to the client.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardsradware (Hardware Description Language), vhjhd (Hardware Description Language), and vhigh-Language, which are currently used in most popular applications. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

Those skilled in the art will also appreciate that, in addition to implementing a client, server as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the client, server are in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a client, server may be considered as a hardware component, and the means included therein for implementing various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, embodiments for a computer storage medium, a server, and a client can all be explained with reference to the introduction of embodiments of the aforementioned method.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Although the present application has been described in terms of embodiments, those of ordinary skill in the art will recognize that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims

1. A video playing method is characterized in that a first reference voiceprint feature library and a played target video are provided; the method comprises the following steps:

receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal;

judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video.

2. The method according to claim 1, wherein after determining whether the at least one voiceprint feature comprises a reference voiceprint feature from the first library of reference voiceprint features, the method further comprises:

3. The method of claim 1, wherein after stopping playing the target video, the method further comprises:

4. The method of claim 1, wherein after stopping playing the target video, the method further comprises:

stopping receiving the sound signal sent by the sound source, waiting for a specified time, receiving a new sound signal sent by the sound source again, and identifying at least one new voiceprint feature included in the new sound signal;

5. A computer storage medium for storing a first library of reference voiceprint features and a target video for playback, and a computer program; the computer program, when executed by a processor, performs the steps of:

6. The computer storage medium of claim 5, wherein the computer program, when executed by the processor, further performs the steps of:

7. The computer storage medium of claim 5, wherein the computer program, when executed by the processor, further performs the steps of:

8. A player comprising a processor and a computer storage medium as claimed in claims 5 to 7, and having at least two acoustic wave sensors disposed thereon; the at least two acoustic wave sensors are used for receiving sound signals sent by a sound source.

9. A video playing method is characterized in that a first reference voiceprint feature library and a target video played by a client are provided; the method comprises the following steps:

receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal;

judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, feeding back specified information representing that the target video is stopped to be played to the client.

10. The method according to claim 9, wherein after determining whether the at least one voiceprint feature comprises a reference voiceprint feature from the first library of reference voiceprint features, the method further comprises:

if not, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the video is larger than or equal to the target video, feeding back specified information representing that the target video is stopped playing to the client.

11. The method according to claim 9, wherein after feeding back to the client specified information characterizing that the target video is stopped from being played, the method further comprises:

12. A server, characterized in that the server comprises a memory and a processor, the memory having stored therein a first library of reference voiceprint features and a target video played by a client, and a computer program which, when executed by the processor, implements the method of any one of claims 9 to 11.

13. A video playing method is characterized in that a second reference voiceprint feature library and a played target video are provided; the method comprises the following steps:

judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, stopping playing the target video.

14. The method according to claim 13, wherein after determining whether the at least one voiceprint feature comprises a reference voiceprint feature from the second library of reference voiceprint features, the method further comprises:

15. The method of claim 13, wherein after stopping playing the target video, the method further comprises:

16. A computer storage medium for storing a second library of reference voiceprint features and a target video for playback, and a computer program; the computer program, when executed by a processor, performs the steps of:

17. The computer storage medium of claim 16, wherein the computer program, when executed by the processor, further performs the steps of:

18. The computer storage medium of claim 16, wherein the computer program, when executed by the processor, further performs the steps of:

19. A player comprising a processor and a computer storage medium as claimed in claims 16 to 18, and having at least two acoustic wave sensors disposed thereon; the at least two acoustic wave sensors are used for receiving sound signals sent by a sound source.

20. A video playing method is characterized in that a second reference voiceprint feature library and a target video played by a client are provided; the method comprises the following steps:

judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, feeding back specified information representing that the target video is stopped to be played to the client.

21. The method according to claim 20, wherein after determining whether the at least one voiceprint feature comprises a reference voiceprint feature from the second library of reference voiceprint features, the method further comprises:

22. The method of claim 20, wherein after feeding back to the client specified information characterizing the stop of playing the target video, the method further comprises:

if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the specified volume threshold, continuing to feed back specified information representing that the target video is stopped to be played to the client.

23. A server, characterized in that the server comprises a memory and a processor, the memory having stored therein a second library of reference voiceprint features and a target video played by a client, and a computer program which, when executed by the processor, implements the method of any one of claims 20 to 22.