CN110830837A - Video playing method, computer storage medium, player and server - Google Patents

Video playing method, computer storage medium, player and server Download PDF

Info

Publication number
CN110830837A
CN110830837A CN201810890020.9A CN201810890020A CN110830837A CN 110830837 A CN110830837 A CN 110830837A CN 201810890020 A CN201810890020 A CN 201810890020A CN 110830837 A CN110830837 A CN 110830837A
Authority
CN
China
Prior art keywords
voiceprint feature
sound signal
target video
new
volume
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810890020.9A
Other languages
Chinese (zh)
Inventor
刘雷晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Beijing Youku Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youku Technology Co Ltd filed Critical Beijing Youku Technology Co Ltd
Priority to CN201810890020.9A priority Critical patent/CN110830837A/en
Publication of CN110830837A publication Critical patent/CN110830837A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42201Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] biosensors, e.g. heat sensor for presence detection, EEG sensors or any limb activity sensors worn by the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone

Abstract

The embodiment of the application discloses a video playing method, a computer storage medium, a player and a server, wherein the method is provided with a first reference voiceprint feature library and a played target video; the method comprises the following steps: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video. According to the technical scheme, other people who enter the private space can be effectively prevented from hearing the played video.

Description

Video playing method, computer storage medium, player and server
Technical Field
The present application relates to the field of internet technologies, and in particular, to a video playing method, a computer storage medium, a player, and a server.
Background
In daily life, when watching a video being played in a private space such as a bedroom or a study, people usually do not want others entering the private space to hear the played video.
At present, the user can only stop playing the video manually when perceiving that other people enter the private space, so as to avoid the other people from hearing the video being played. However, when the user carefully watches the video, it is often not easy to perceive that other people enter the private space of the user, and at this time, it is difficult to avoid that other people hear the played video.
Disclosure of Invention
An object of the present invention is to provide a method, a computer storage medium, a player and a server for playing a video, which can effectively prevent other people entering a private space from hearing the played video.
In order to achieve the above object, the present application provides a method for playing a video, where the method provides a first reference voiceprint feature library and a played target video; the method comprises the following steps: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video.
In order to achieve the above object, the present application further provides a computer storage medium for storing a first reference voiceprint feature library and a played target video, and a computer program; the computer program, when executed by a processor, performs the steps of: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video.
In order to achieve the above object, the present application further provides a player, where the player includes a processor and a computer storage medium, and at least two acoustic wave sensors are disposed on the player; the at least two acoustic wave sensors are used for receiving sound signals sent by a sound source.
In order to achieve the above object, an embodiment of the present application further provides a method for playing a video, where the method provides a first reference voiceprint feature library and a target video played by a client; the method comprises the following steps: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, feeding back specified information representing that the target video is stopped to be played to the client.
To achieve the above object, the present application further provides a server, which includes a memory and a processor, wherein the memory stores a first reference voiceprint feature library and a target video played by a client, and a computer program, when the computer program is executed by the processor, the server implements the following steps: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, feeding back specified information representing that the target video is stopped to be played to the client.
In order to achieve the above object, the present application further provides a method for playing a video, where the method is provided with a second reference voiceprint feature library and a played target video; the method comprises the following steps: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, stopping playing the target video.
In order to achieve the above object, the present application further provides a computer storage medium for storing a second reference voiceprint feature library and a played target video, and a computer program; the computer program, when executed by a processor, performs the steps of: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, stopping playing the target video.
In order to achieve the above object, an embodiment of the present application further provides a method for playing a video, where the method is provided with a second reference voiceprint feature library and a target video played by a client; the method comprises the following steps: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, feeding back specified information representing that the target video is stopped to be played to the client.
To achieve the above object, the present application provides a server, which includes a memory and a processor, the memory stores a second reference voiceprint feature library and a target video played by a client, and a computer program, when executed by the processor, the computer program implements the following steps: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal; judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, feeding back specified information representing that the target video is stopped to be played to the client.
As can be seen from the above, in the present application, the first reference voiceprint feature library includes at least one reference voiceprint feature, and the reference voiceprint feature may be, for example, a voiceprint feature of the user. When a user watches a target video which is being played in a private space of the user, the sound signal sent by the sound source can be continuously received through the client, for example, if other people enter the private space, the sound signal sent by the other people can be received through the client, and at least one voiceprint characteristic included in the sound signal is identified. Then, the client may further determine whether the at least one voiceprint feature includes a reference voiceprint feature in the first reference voiceprint feature library, for example, a user's own voiceprint feature; and if not, stopping playing the target video. Therefore, although other people enter the private space of the user when the user carefully watches the video, the user can immediately find the video and stop playing the video, so that the situation that other people entering the private space hear the played video can be effectively avoided.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of a method for video playback in an embodiment of the present application;
FIG. 2 is a diagram illustrating the results of a player in an embodiment of the present application;
fig. 3 is a flowchart of another example of a method for playing a video in an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present application;
fig. 5 is a flowchart of another example of a method for playing a video in an embodiment of the present application;
fig. 6 is a flowchart of another example of a method for playing a video in this embodiment.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.
The embodiment of the application provides a video playing method, which can be applied to system architectures of a client and a server. The server may be a device that stores video data. In particular, the server may be a background business server of a website capable of providing video services. The website can be, for example, an arcade art, a fox search video, an Tencent video, an Acfun, and the like. In this embodiment, the server may be an electronic device having data operation, storage function and network interaction function; software may also be provided that runs in the electronic device to support data processing, storage, and network interaction. The number of servers is not particularly limited in the present embodiment. The server may be one server, several servers, or a server cluster formed by several servers.
In this embodiment, the client may be an electronic device for rendering video data and may capture sound signals. Specifically, the client may be, for example, a desktop computer, a tablet computer, a notebook computer, a smart phone, a digital assistant, a smart wearable device, a television with a network access function, and the like, which have a sound signal capturing function. Alternatively, the client may be software capable of running in the electronic device. Specifically, the client may be a browser in the electronic device, and the browser may load an access portal provided by the video website platform. The video website platform may be, for example, an arcade, a fox video, an Acfun, or the like, and the access portal may be a home page of the website platform. The client can also be an application which is provided by a video website platform and runs in the intelligent terminal.
The embodiment of the application provides a video playing method, which can be applied to the client. The method may be provided with a first library of reference voiceprint features and a target video for playback.
In this embodiment, the server may be provided with a first reference voiceprint feature library. The first library of reference voiceprint features can be a data set storing voiceprint features. The first reference voiceprint feature library can adopt any one of database formats such as MySQL, Oracle, DB2 and Sybase. The first library of reference voiceprint features can be deployed on a storage medium in a server. Furthermore, the client may download the first reference voiceprint feature library from the server, and store the downloaded first reference voiceprint feature library in a memory, so as to perform subsequent voiceprint feature comparison. The storage may be a memory or a cache.
In this embodiment, the first reference voiceprint feature library may include at least one reference voiceprint feature. The reference voiceprint features included in the first reference voiceprint feature library may be, for example, the voiceprint features of the user, or other voiceprint features set by the user. Other voiceprint features can be selected according to the user's own wishes, for example, voiceprint features of family members of the user, or voiceprint features of close friends of the user, and the like.
In this embodiment, the target video played on the client may be a video currently viewed by the user. The video currently watched by the user may be played after the client renders the video data representing the target video. Wherein the video data can be downloaded by the client from a video database stored in the server. The video database may be a data set storing video data. The video database may be in any one of MySQL, Oracle, DB2, Sybase, etc. database formats.
Referring to fig. 1, the method for playing the video may include the following steps.
S11: a sound signal emitted by a sound source is received and at least one voiceprint feature included in the sound signal is identified.
In the present embodiment, the sound source may be a person who emits sound. Specifically, the sound source may be a person who enters or is near a user's private space, or the like. The private space of the user may be, for example, the user's bedroom or study, etc. In practical situations, when a user watches a video played by the client in his or her bedroom or study, a person entering the user's bedroom or study, or a person near the user's bedroom or study, may generate different sound signals due to speaking. However, these people may hear the video played by the client due to being within or near the user's private space. Thus, it is necessary to capture the sound signals emitted by these sound sources through the client and perform subsequent processing to effectively control the playing of the video, so as to prevent these people from hearing the video watched by the user.
In this embodiment, the sound source may be one person or a plurality of persons, the sounds of different persons having different voiceprint characteristics. Then, the sound signal may include a sound signal emitted by at least one person. Thus, the sound signal may include at least one voiceprint feature therein.
In this embodiment, the client may receive a sound signal emitted by a sound source. Specifically, for example, a sound signal emitted from the sound source may be received by a microphone mounted on the client. Wherein, the number of the microphones loaded on the client can be one or more. In practical applications, microphones may be respectively disposed in different directions on the client, so that the client can receive sound signals from different directions as accurately as possible.
In this embodiment, after receiving a sound signal emitted by a sound source, the client may also identify at least one voiceprint feature included in the sound signal. Specifically, for example, after receiving the sound signal emitted by the sound source, the client may convert the sound signal from the time domain to the frequency domain to obtain the sound signal in the frequency domain, and may identify the voiceprint feature of the sound signal with the signal intensity greater than the specified intensity from the sound signal in the frequency domain.
S13: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video.
In this embodiment, after identifying at least one voiceprint feature included in the voice signal, the client may determine whether a reference voiceprint feature in the first reference voiceprint feature library is included in the at least one voiceprint feature. Specifically, the client may compare the identified voiceprint features with the reference voiceprint features in the first reference voiceprint feature library one by one, and determine whether each identified voiceprint feature matches the reference voiceprint features. If the judgment results are not matched, the at least one voiceprint feature does not contain the reference voiceprint feature in the first reference feature library; and if the judgment result is that the recognized voiceprint features are matched with the reference voiceprint features in the first reference voiceprint feature library, the at least one voiceprint feature comprises the reference voiceprint features in the first reference feature library. For example, the client identifies a plurality of voiceprint features, and indicates that at least one of the voiceprint features includes a reference voiceprint feature in the first reference feature library when the determination result indicates that at least one of the voiceprint features matches a reference voiceprint feature in the first reference feature library. In this embodiment, the determining, by the client, whether the identified voiceprint feature matches the reference voiceprint feature may specifically include calculating, by the client, a matching degree between the identified voiceprint feature and the reference voiceprint feature, and when the matching degree is greater than or equal to a specified matching degree threshold, determining that the identified voiceprint feature matches the reference voiceprint feature; when the degree of match is less than a specified degree of match threshold, it may be determined that the identified voiceprint feature does not match the reference voiceprint feature. The value range of the specified matching degree threshold may be specifically 80 to 100 percent.
In this embodiment, when the determination result is that the target video is not included, the client may stop playing the target video. Therefore, although the user carefully watches the video, when other people enter the private space of the user, once the client judges that the voiceprint features in the sound signals sent by the people do not contain the reference voiceprint features, the video playing can be automatically stopped, and the situation that the other people entering the private space hear the played video can be effectively avoided.
In a specific application scenario, the client may be a tablet computer. The user may use his or her voiceprint characteristics as reference voiceprint characteristics in the first reference voiceprint characteristics library. The user is watching the video played by the tablet computer in his or her own bedroom. At this time, other people enter the user's bedroom while speaking, and the tablet computer can receive the voice signals sent by the people and can recognize various voiceprint characteristics included in the voice signals. After identifying the voiceprint features, the tablet computer can determine whether the voiceprint features contain the user's own voiceprint features. Because the voiceprint features of other people are not matched with the voiceprint features of the user, the tablet personal computer can be judged to be not contained, and the video watched by the user is directly stopped playing. Therefore, although the user carefully watches the video, when other people enter the private space of the user, once the client judges that the voiceprint features in the sound signals sent by the people do not contain the reference voiceprint features, the video playing can be automatically stopped, and the situation that the other people entering the private space hear the played video can be effectively avoided.
In one embodiment of the present application, it is often difficult for people far away from the private space of the user to hear or hear the video played by the client that the user watches in the private space of the user, however, sound signals emitted by the people may still be received by the client and cause the video to stop playing. To avoid this, the client may also use the volume of the sound signal as a factor in controlling the video playback. After determining whether the at least one voiceprint feature comprises a reference voiceprint feature in the first library of reference voiceprint features, the method may further include: if the result of the determination is that the sub-sound signal is not included, the client may split the sound signal into at least one path of sub-sound signal based on the recognized voiceprint feature, and determine whether the volume of the sub-sound signal with the largest volume is greater than or equal to a specified volume threshold; if the target video is larger than or equal to the target video, stopping playing the target video, and if the target video is smaller than the target video, continuing playing the target video. For example, some people chat at places far from the private space of the user, and after determining that the plurality of voiceprint features in the sound signals emitted by these people do not include the reference voiceprint feature, the client may split the sound signal into multiple sub-sound signals based on the plurality of recognized voiceprint features, and determine whether the volume of the sub-sound signal with the largest volume is greater than or equal to a specified volume threshold. Since these people are far away from the user's private space, the volume of the sound signal that the client typically receives is less than the specified volume threshold, and the target video can continue to be played. Therefore, the situation that the video stops playing due to the fact that people far away from the private space of the user chat can be avoided. The value range of the designated volume threshold value can be 15-25 decibels. The specified volume threshold may be set according to actual application, and is not limited herein.
In this embodiment, the client may split the sound signal into at least one path of sub-sound signals based on the identified voiceprint features. Specifically, the client may convert the sound signal from a time domain to a frequency domain to obtain a sound signal in the frequency domain, and then may identify a sub-sound signal in the frequency domain that matches the voiceprint feature from the sound signal in the frequency domain and convert the sub-sound signal in the frequency domain to a sub-sound signal in the time domain. In this way, the sound signal containing at least one voiceprint feature can be split into at least one sub-sound signal.
In an embodiment of the present application, in a practical application, a person entering the private space of the user may leave quickly, or may chat with the user for a short time in the private space of the user, so that it is required that the client continues to receive the sound signal emitted by the sound source after stopping playing the target video, and further controls video playing according to specific situations, instead of stopping playing the video all the time. Specifically, after stopping playing the target video, the method further includes: the client can continue to receive a new sound signal emitted by the sound source and identify at least one new voiceprint feature included in the new sound signal again; whether the at least one new voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library can be judged; if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video, and if the volume is smaller than the designated volume threshold, playing the target video. For example, after stopping playing the target video, the client may continue to receive a new sound signal sent by another person, and identify a plurality of new voiceprint features included in the new sound signal again, and when the determination result is still that the new voiceprint features are not included, may split the new sound signal into multiple paths of sub-sound signals based on the identified new voiceprint features, and determine whether the volume of the sub-sound signal with the largest volume is greater than or equal to a specified volume threshold. At this time, when the person entering the private space of the user chats with the user for a short time, the client side stops playing the target video continuously if the judgment result is greater than or equal to the specified volume threshold, and when the person entering the private space of the user may leave rapidly, the client side can continue playing the target video if the judgment result is less than the specified volume threshold.
In an embodiment of the present application, in an actual application, after the target video is stopped playing, a person who usually enters the private space of the user does not leave immediately or a person who chats near the private space of the user does not end immediately, and the person often stays for a certain period of time. Specifically, after stopping playing the target video, the method further includes: the client can stop receiving the sound signal sent by the sound source, wait for a specified time, receive a new sound signal sent by the sound source again, and identify at least one new voiceprint feature included in the new sound signal; whether the at least one new voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library can be judged; if not, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video. Wherein the designated time period can be 1-5 minutes. The specified time period may be set according to the actual application, and is not limited herein.
In this embodiment, the functions implemented in the above method steps may be implemented by a computer program, and the computer program may be stored in a computer storage medium. In particular, the computer storage medium may be coupled to a processor, which may thereby read the computer program from the computer storage medium. The computer storage medium may be configured to store a first library of reference voiceprint features and a played target video. The computer program, when executed by a processor, may implement the steps of:
s11: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal;
s13: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video.
In one embodiment, the computer program, when executed by a processor, further performs the steps of:
if not, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the target video is larger than or equal to the target video, stopping playing the target video.
In one embodiment, the computer program, when executed by a processor, further performs the steps of:
continuing to receive a new sound signal emitted by the sound source and re-identifying at least one new voiceprint feature included in the new sound signal;
judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the first reference voiceprint feature library;
if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video.
It should be noted that, the functions that can be realized by the computer program in the computer storage medium can all refer to the foregoing method implementation embodiments, and the technical effects achieved are also similar to the technical effects achieved in the foregoing method implementation embodiments, and are not described here again.
Referring to fig. 2, the present application further provides a player. The player may be the client described above. The player comprises a processor and the computer storage medium. The computer storage medium may be coupled to a processor, which may thereby read the computer program from the computer storage medium. In addition, at least two acoustic wave sensors are arranged on the player. The acoustic wave sensor may be a sensor capable of detecting an acoustic wave. The sound waves may have different frequency ranges, wherein the sound waves may be sounds that can be perceived by a person. Thus, the acoustic wave sensor may be an acoustic sensor. The sound sensor may be, for example, various types of microphones. The at least two acoustic wave sensors may be configured to receive acoustic signals from an acoustic source. Taking two acoustic wave sensors as an example, the two acoustic wave sensors can be disposed at different positions on the player, and both acoustic wave sensors can receive the sound signal from the sound source.
In this embodiment, the computer storage medium may include a physical device for storing information, and is generally a medium that digitizes information and stores the information in an electronic, magnetic, or optical manner. The computer storage medium according to this embodiment may further include: devices that store information using electrical energy, such as RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, usb disks; devices for storing information optically, such as CDs or DVDs. Of course, there are other forms of computer storage media, such as quantum memory, graphene memory, and the like.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.
The specific functions implemented by the computer storage medium and the player provided in the embodiments of the present description may be explained with reference to the foregoing embodiments in the present description, and can achieve the technical effects of the foregoing embodiments, and thus, will not be described herein again.
In an embodiment of the present application, the execution main body of the method may be further split into the client and the server, so as to reduce the operation pressure of the original independent execution main body client, improve the operation efficiency, and reduce the manufacturing cost of the client. In this way, the present application may also provide a method for playing a video, where an execution subject of the method is the server described above. The method is provided with a first reference voiceprint feature library and a target video played by a client. Referring to fig. 3, the method includes the following steps.
S21: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal.
In this embodiment, the server may receive a voice signal from a client and identify at least one voiceprint feature included in the voice signal. Wherein the sound signal is received by the client after being emitted by a sound source. Specifically, the implementation process of the client receiving the sound signal emitted by the sound source and the implementation process of the server identifying at least one voiceprint feature included in the sound signal may refer to the corresponding implementation steps in step S11.
S23: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, feeding back specified information representing that the target video is stopped to be played to the client.
In this embodiment, the server may determine whether the at least one voiceprint feature includes a reference voiceprint feature in the first reference voiceprint feature library, and if not, feed back designation information representing that the target video is stopped from being played to the client, so that the client stops playing the target video after receiving the designation information. Specifically, the server determines whether the at least one voiceprint feature includes a specific implementation procedure of the reference voiceprint feature in the first reference voiceprint feature library, and may refer to a corresponding implementation step in step S13.
In an embodiment of the present application, after the server determines whether the at least one voiceprint feature includes a reference voiceprint feature in the first reference voiceprint feature library, the method may further include: if not, the server splits the sound signal into at least one path of sub sound signal based on the identified voiceprint features, and judges whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value; and if the video is larger than or equal to the target video, feeding back specified information representing that the target video is stopped playing to the client.
In an embodiment of the present application, after feeding back, to the client, the specific information characterizing that the target video stops playing, the method may further include: the server can continue to receive new voice signals sent by the client and identify at least one new voiceprint feature included in the new voice signals again; judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the first reference voiceprint feature library; if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the specified volume threshold, continuing to feed back specified information representing that the target video is stopped to be played to the client.
Referring to fig. 4, the present application further provides a server, which includes a memory and a processor, wherein the memory stores a first reference voiceprint feature library and a target video played by a client, and a computer program, when the computer program is executed by the processor, the server implements the following steps:
s21: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal;
s23: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, feeding back specified information representing that the target video is stopped to be played to the client.
In this embodiment, the memory may include a physical device for storing information, and typically, the information is digitized and then stored in a medium using an electrical, magnetic, or optical method. The memory according to this embodiment may further include: devices that store information using electrical energy, such as RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, usb disks; devices for storing information optically, such as CDs or DVDs. Of course, there are other ways of memory, such as quantum memory, graphene memory, and so forth.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.
In one embodiment, the computer program, when executed by the processor, further implements the steps of: if not, the server splits the sound signal into at least one path of sub sound signal based on the identified voiceprint features, and judges whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value; and if the video is larger than or equal to the target video, feeding back specified information representing that the target video is stopped playing to the client.
In one embodiment, the computer program, when executed by the processor, further implements the steps of:
continuously receiving a new sound signal sent by the client, and identifying at least one new voiceprint feature included in the new sound signal again;
judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the first reference voiceprint feature library;
if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the specified volume threshold, continuing to feed back specified information representing that the target video is stopped to be played to the client.
The specific functions implemented by the memory and the processor of the server provided in the embodiments of the present specification may be explained in comparison with the foregoing embodiments in the present specification, and can achieve the technical effects of the foregoing embodiments, and thus, no further description is provided herein.
The application also provides a video playing method. The method can be applied to the client. The method is provided with a second reference voiceprint feature library and a played target video.
In this embodiment, a second reference voiceprint feature library may be provided in the server. The second library of reference voiceprint features can be a data set storing voiceprint features. The second reference voiceprint feature library can adopt any one of database formats such as MySQL, Oracle, DB2 and Sybase. The second library of reference voiceprint features can be deployed on a storage medium in a server. Furthermore, the client may download the second reference voiceprint feature library from the server, and store the downloaded second reference voiceprint feature library in a memory, so as to perform subsequent voiceprint feature comparison. The storage may be a memory or a cache.
In this embodiment, the second reference voiceprint feature library may include at least one reference voiceprint feature. The reference voiceprint features included in the second reference voiceprint feature library may be, for example, voiceprint features of other people who can enter the private space of the user, or voiceprint features of certain people set by the user. The voiceprint characteristics of some persons can be selected specifically according to the user's own wishes.
Referring to fig. 5, the method for playing the video may include the following steps.
S31: a sound signal emitted by a sound source is received and at least one voiceprint feature included in the sound signal is identified.
In this embodiment, the client may receive a sound signal emitted by a sound source and identify at least one voiceprint feature included in the sound signal. The specific implementation process of this step is similar to step S11, and reference may be made to the implementation step corresponding to step S11.
S33: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, stopping playing the target video.
In this embodiment, the client may determine whether the at least one voiceprint feature includes a reference voiceprint feature in the second reference voiceprint feature library, and if so, stop playing the target video. The specific implementation process of the client determining whether the at least one voiceprint feature includes the reference voiceprint feature in the second reference voiceprint feature library is similar to the specific implementation process of the step S13 of determining whether the at least one voiceprint feature includes the reference voiceprint feature in the first reference voiceprint feature library, and reference may be made to the implementation step corresponding to step 13.
In an embodiment of the present application, after determining whether the at least one voiceprint feature includes a reference voiceprint feature in the second library of reference voiceprint features, the method may further include: if yes, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the target video is larger than or equal to the target video, stopping playing the target video.
In an embodiment of the present application, after stopping playing the target video, the method may further include: the client can continue to receive a new sound signal emitted by the sound source and identify at least one new voiceprint feature included in the new sound signal again; judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the second reference voiceprint feature library; if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video.
In this embodiment, the functions implemented in the above method steps may be implemented by a computer program, and the computer program may be stored in a computer storage medium. In particular, the computer storage medium may be coupled to a processor, which may thereby read the computer program from the computer storage medium. The computer storage medium may be configured to store a second library of reference voiceprint features and the played target video. The computer program, when executed by a processor, may implement the steps of:
s31: receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal;
s33: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, stopping playing the target video.
In one embodiment, the computer program, when executed by a processor, further performs the steps of:
if yes, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the target video is larger than or equal to the target video, stopping playing the target video.
In one embodiment, the computer program, when executed by a processor, further performs the steps of:
continuing to receive a new sound signal emitted by the sound source and re-identifying at least one new voiceprint feature included in the new sound signal;
judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the second reference voiceprint feature library;
if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video.
It should be noted that, the functions that can be realized by the computer program in the computer storage medium can all refer to the foregoing method implementation embodiments, and the technical effects achieved are also similar to the technical effects achieved in the foregoing method implementation embodiments, and are not described here again.
The application also provides a player. The player may be the client described above. The player comprises a processor and the computer storage medium. The computer storage medium may be coupled to a processor, which may thereby read the computer program from the computer storage medium. In addition, at least two acoustic wave sensors are arranged on the player. The acoustic wave sensor may be a sensor capable of detecting an acoustic wave. The sound waves may have different frequency ranges, wherein the sound waves may be sounds that can be perceived by a person. Thus, the acoustic wave sensor may be an acoustic sensor. The sound sensor may be, for example, various types of microphones. The at least two acoustic wave sensors may be configured to receive acoustic signals from an acoustic source. Taking two acoustic wave sensors as an example, the two acoustic wave sensors can be disposed at different positions on the player, and both acoustic wave sensors can receive the sound signal from the sound source.
In this embodiment, the computer storage medium may include a physical device for storing information, and is generally a medium that digitizes information and stores the information in an electronic, magnetic, or optical manner. The computer storage medium according to this embodiment may further include: devices that store information using electrical energy, such as RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, usb disks; devices for storing information optically, such as CDs or DVDs. Of course, there are other forms of computer storage media, such as quantum memory, graphene memory, and the like.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.
The specific functions implemented by the computer storage medium and the player provided in the embodiments of the present description may be explained with reference to the foregoing embodiments in the present description, and can achieve the technical effects of the foregoing embodiments, and thus, will not be described herein again.
In an embodiment of the present application, the execution main body of the method may be further split into the client and the server, so as to reduce the operation pressure of the original independent execution main body client, improve the operation efficiency, and reduce the manufacturing cost of the client. In this way, the present application may also provide a method for playing a video, where an execution subject of the method is the server described above. The method is provided with a second reference voiceprint feature library and a target video played by the client. Referring to fig. 6, the method includes the following steps.
S41: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal.
In this embodiment, the server may receive a voice signal from a client and identify at least one voiceprint feature included in the voice signal. Wherein the sound signal is received by the client after being emitted by a sound source. Specifically, the implementation process of the client receiving the sound signal emitted by the sound source and the implementation process of the server identifying at least one voiceprint feature included in the sound signal may refer to the corresponding implementation steps in step S31.
S43: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, feeding back specified information representing that the target video is stopped to be played to the client.
In this embodiment, the server may determine whether the at least one voiceprint feature includes a reference voiceprint feature in the second reference voiceprint feature library, and if so, feed back designation information representing that the target video is stopped being played to the client, so that the client stops playing the target video after receiving the designation information. Specifically, the server determines whether the at least one voiceprint feature includes a specific implementation procedure of the reference voiceprint feature in the second reference voiceprint feature library, and may refer to a corresponding implementation step in step S33.
In an embodiment of the present application, after the server determines whether the at least one voiceprint feature includes a reference voiceprint feature in the second reference voiceprint feature library, the method may further include: if yes, the server can split the sound signal into at least one path of sub sound signal based on the identified voiceprint feature, and judge whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value; and if the video is larger than or equal to the target video, feeding back specified information representing that the target video is stopped playing to the client.
In an embodiment of the present application, after feeding back, to the client, the specific information characterizing that the target video stops playing, the method may further include: the server can continue to receive new voice signals sent by the client and identify at least one new voiceprint feature included in the new voice signals again; whether the at least one new voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library can be judged; if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the specified volume threshold, continuing to feed back specified information representing that the target video is stopped to be played to the client.
The present application further provides a server, which includes a memory and a processor, wherein the memory stores a second reference voiceprint feature library and a target video played by a client, and a computer program, and when the computer program is executed by the processor, the following steps are implemented:
s41: receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal.
S43: judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, feeding back specified information representing that the target video is stopped to be played to the client.
In this embodiment, the memory may include a physical device for storing information, and typically, the information is digitized and then stored in a medium using an electrical, magnetic, or optical method. The memory according to this embodiment may further include: devices that store information using electrical energy, such as RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, usb disks; devices for storing information optically, such as CDs or DVDs. Of course, there are other ways of memory, such as quantum memory, graphene memory, and so forth.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.
In one embodiment, the computer program, when executed by the processor, further implements the steps of:
if yes, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the video is larger than or equal to the target video, feeding back specified information representing that the target video is stopped playing to the client.
In one embodiment, the computer program, when executed by the processor, further implements the steps of:
continuously receiving a new sound signal sent by the client, and identifying at least one new voiceprint feature included in the new sound signal again; whether the at least one new voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library can be judged; if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the specified volume threshold, continuing to feed back specified information representing that the target video is stopped to be played to the client.
As can be seen from the above, in the present application, the first reference voiceprint feature library includes at least one reference voiceprint feature, and the reference voiceprint feature may be, for example, a voiceprint feature of the user. When a user watches a target video which is being played in a private space of the user, the sound signal sent by the sound source can be continuously received through the client, for example, if other people enter the private space, the sound signal sent by the other people can be received through the client, and at least one voiceprint characteristic included in the sound signal is identified. Then, the client may further determine whether the at least one voiceprint feature includes a reference voiceprint feature in the first reference voiceprint feature library, for example, a user's own voiceprint feature; and if not, stopping playing the target video. Therefore, although other people enter the private space of the user when the user carefully watches the video, the user can immediately find the video and stop playing the video, so that the situation that other people entering the private space hear the played video can be effectively avoided.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardsradware (Hardware Description Language), vhjhd (Hardware Description Language), and vhigh-Language, which are currently used in most popular applications. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
Those skilled in the art will also appreciate that, in addition to implementing a client, server as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the client, server are in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a client, server may be considered as a hardware component, and the means included therein for implementing various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, embodiments for a computer storage medium, a server, and a client can all be explained with reference to the introduction of embodiments of the aforementioned method.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Although the present application has been described in terms of embodiments, those of ordinary skill in the art will recognize that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims (23)

1. A video playing method is characterized in that a first reference voiceprint feature library and a played target video are provided; the method comprises the following steps:
receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal;
judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video.
2. The method according to claim 1, wherein after determining whether the at least one voiceprint feature comprises a reference voiceprint feature from the first library of reference voiceprint features, the method further comprises:
if not, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the target video is larger than or equal to the target video, stopping playing the target video.
3. The method of claim 1, wherein after stopping playing the target video, the method further comprises:
continuing to receive a new sound signal emitted by the sound source and re-identifying at least one new voiceprint feature included in the new sound signal;
judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the first reference voiceprint feature library;
if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video.
4. The method of claim 1, wherein after stopping playing the target video, the method further comprises:
stopping receiving the sound signal sent by the sound source, waiting for a specified time, receiving a new sound signal sent by the sound source again, and identifying at least one new voiceprint feature included in the new sound signal;
judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the first reference voiceprint feature library;
if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video.
5. A computer storage medium for storing a first library of reference voiceprint features and a target video for playback, and a computer program; the computer program, when executed by a processor, performs the steps of:
receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal;
judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, stopping playing the target video.
6. The computer storage medium of claim 5, wherein the computer program, when executed by the processor, further performs the steps of:
if not, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the target video is larger than or equal to the target video, stopping playing the target video.
7. The computer storage medium of claim 5, wherein the computer program, when executed by the processor, further performs the steps of:
continuing to receive a new sound signal emitted by the sound source and re-identifying at least one new voiceprint feature included in the new sound signal;
judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the first reference voiceprint feature library;
if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video.
8. A player comprising a processor and a computer storage medium as claimed in claims 5 to 7, and having at least two acoustic wave sensors disposed thereon; the at least two acoustic wave sensors are used for receiving sound signals sent by a sound source.
9. A video playing method is characterized in that a first reference voiceprint feature library and a target video played by a client are provided; the method comprises the following steps:
receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal;
judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the first reference voiceprint feature library; and if not, feeding back specified information representing that the target video is stopped to be played to the client.
10. The method according to claim 9, wherein after determining whether the at least one voiceprint feature comprises a reference voiceprint feature from the first library of reference voiceprint features, the method further comprises:
if not, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the video is larger than or equal to the target video, feeding back specified information representing that the target video is stopped playing to the client.
11. The method according to claim 9, wherein after feeding back to the client specified information characterizing that the target video is stopped from being played, the method further comprises:
continuously receiving a new sound signal sent by the client, and identifying at least one new voiceprint feature included in the new sound signal again;
judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the first reference voiceprint feature library;
if not, splitting the new sound signal into at least one path of sub sound signals based on the identified new voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is greater than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the specified volume threshold, continuing to feed back specified information representing that the target video is stopped to be played to the client.
12. A server, characterized in that the server comprises a memory and a processor, the memory having stored therein a first library of reference voiceprint features and a target video played by a client, and a computer program which, when executed by the processor, implements the method of any one of claims 9 to 11.
13. A video playing method is characterized in that a second reference voiceprint feature library and a played target video are provided; the method comprises the following steps:
receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal;
judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, stopping playing the target video.
14. The method according to claim 13, wherein after determining whether the at least one voiceprint feature comprises a reference voiceprint feature from the second library of reference voiceprint features, the method further comprises:
if yes, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the target video is larger than or equal to the target video, stopping playing the target video.
15. The method of claim 13, wherein after stopping playing the target video, the method further comprises:
continuing to receive a new sound signal emitted by the sound source and re-identifying at least one new voiceprint feature included in the new sound signal;
judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the second reference voiceprint feature library;
if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video.
16. A computer storage medium for storing a second library of reference voiceprint features and a target video for playback, and a computer program; the computer program, when executed by a processor, performs the steps of:
receiving a sound signal emitted by a sound source, and identifying at least one voiceprint feature included in the sound signal;
judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, stopping playing the target video.
17. The computer storage medium of claim 16, wherein the computer program, when executed by the processor, further performs the steps of:
if yes, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the target video is larger than or equal to the target video, stopping playing the target video.
18. The computer storage medium of claim 16, wherein the computer program, when executed by the processor, further performs the steps of:
continuing to receive a new sound signal emitted by the sound source and re-identifying at least one new voiceprint feature included in the new sound signal;
judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the second reference voiceprint feature library;
if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the designated volume threshold, continuing to stop playing the target video.
19. A player comprising a processor and a computer storage medium as claimed in claims 16 to 18, and having at least two acoustic wave sensors disposed thereon; the at least two acoustic wave sensors are used for receiving sound signals sent by a sound source.
20. A video playing method is characterized in that a second reference voiceprint feature library and a target video played by a client are provided; the method comprises the following steps:
receiving a sound signal sent by a client, and identifying at least one voiceprint feature included in the sound signal;
judging whether the at least one voiceprint feature comprises a reference voiceprint feature in the second reference voiceprint feature library; and if so, feeding back specified information representing that the target video is stopped to be played to the client.
21. The method according to claim 20, wherein after determining whether the at least one voiceprint feature comprises a reference voiceprint feature from the second library of reference voiceprint features, the method further comprises:
if yes, splitting the sound signal into at least one path of sub sound signal based on the identified voiceprint characteristics, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the video is larger than or equal to the target video, feeding back specified information representing that the target video is stopped playing to the client.
22. The method of claim 20, wherein after feeding back to the client specified information characterizing the stop of playing the target video, the method further comprises:
continuously receiving a new sound signal sent by the client, and identifying at least one new voiceprint feature included in the new sound signal again;
judging whether the at least one new voiceprint feature contains a reference voiceprint feature in the second reference voiceprint feature library;
if yes, splitting the new sound signal into at least one path of sub sound signal based on the identified new voiceprint feature, and judging whether the volume of the sub sound signal with the maximum volume is larger than or equal to a specified volume threshold value or not; and if the volume is larger than or equal to the specified volume threshold, continuing to feed back specified information representing that the target video is stopped to be played to the client.
23. A server, characterized in that the server comprises a memory and a processor, the memory having stored therein a second library of reference voiceprint features and a target video played by a client, and a computer program which, when executed by the processor, implements the method of any one of claims 20 to 22.
CN201810890020.9A 2018-08-07 2018-08-07 Video playing method, computer storage medium, player and server Pending CN110830837A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810890020.9A CN110830837A (en) 2018-08-07 2018-08-07 Video playing method, computer storage medium, player and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810890020.9A CN110830837A (en) 2018-08-07 2018-08-07 Video playing method, computer storage medium, player and server

Publications (1)

Publication Number Publication Date
CN110830837A true CN110830837A (en) 2020-02-21

Family

ID=69534038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810890020.9A Pending CN110830837A (en) 2018-08-07 2018-08-07 Video playing method, computer storage medium, player and server

Country Status (1)

Country Link
CN (1) CN110830837A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130079908A1 (en) * 2011-09-28 2013-03-28 Hon Hai Precision Industry Co., Ltd. Electronic device with automatic pause function and method thereof
CN103035274A (en) * 2011-09-30 2013-04-10 富泰华工业(深圳)有限公司 Electronic device and method with multimedia file play pausing function
CN105451037A (en) * 2015-11-17 2016-03-30 小米科技有限责任公司 Working method of equipment and apparatus thereof
CN105810219A (en) * 2016-03-11 2016-07-27 宇龙计算机通信科技(深圳)有限公司 Multimedia file playing method and playing system, and audio terminal
US20170221500A1 (en) * 2016-02-02 2017-08-03 Ebay Inc. Personalized, real-time audio processing
CN108307238A (en) * 2018-01-23 2018-07-20 北京中企智达知识产权代理有限公司 A kind of video playing control method, system and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130079908A1 (en) * 2011-09-28 2013-03-28 Hon Hai Precision Industry Co., Ltd. Electronic device with automatic pause function and method thereof
CN103035274A (en) * 2011-09-30 2013-04-10 富泰华工业(深圳)有限公司 Electronic device and method with multimedia file play pausing function
CN105451037A (en) * 2015-11-17 2016-03-30 小米科技有限责任公司 Working method of equipment and apparatus thereof
US20170221500A1 (en) * 2016-02-02 2017-08-03 Ebay Inc. Personalized, real-time audio processing
CN105810219A (en) * 2016-03-11 2016-07-27 宇龙计算机通信科技(深圳)有限公司 Multimedia file playing method and playing system, and audio terminal
CN108307238A (en) * 2018-01-23 2018-07-20 北京中企智达知识产权代理有限公司 A kind of video playing control method, system and equipment

Similar Documents

Publication Publication Date Title
JP6883119B2 (en) Key phrase detection with audio watermark
US10249301B2 (en) Method and system for speech recognition processing
US9734830B2 (en) Speech recognition wake-up of a handheld portable electronic device
WO2017084185A1 (en) Intelligent terminal control method and system based on semantic analysis, and intelligent terminal
JP2017538341A (en) Volume control method, system, device and program
CN105793921A (en) Initiating actions based on partial hotwords
JP2016522910A (en) Adaptive audio frame processing for keyword detection
WO2017154282A1 (en) Voice processing device and voice processing method
CN108055617B (en) Microphone awakening method and device, terminal equipment and storage medium
US11201598B2 (en) Volume adjusting method and mobile terminal
CN111755002B (en) Speech recognition device, electronic apparatus, and speech recognition method
US20120053937A1 (en) Generalizing text content summary from speech content
CN116806355A (en) Speech shortcut detection with speaker verification
US20210082405A1 (en) Method for Location Reminder and Electronic Device
US20230395077A1 (en) Device finder using voice authentication
CN110830837A (en) Video playing method, computer storage medium, player and server
US10693944B1 (en) Media-player initialization optimization
WO2022143349A1 (en) Method and device for determining user intent
CN107707721B (en) Recording method and device of mobile terminal, storage medium and mobile terminal
US10489192B2 (en) Method and controlling apparatus for automatically terminating an application of an electronic apparatus based on audio volume level being adjusted lower than a threshold audio volume level by a user
US20200213732A1 (en) Volume adjusting method, device, and terminal device
KR20230147157A (en) Contextual suppression of assistant command(s)
CN112104949B (en) Method and device for detecting pickup assembly and electronic equipment
US20150100321A1 (en) Intelligent state aware system control utilizing two-way voice / audio communication
CN111045641A (en) Electronic terminal and voice recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200512

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: 100102 No. 4 Building, Wangjing Dongyuan District, Chaoyang District, Beijing

Applicant before: BEIJING YOUKU TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20200221

RJ01 Rejection of invention patent application after publication