WO2021098175A1 - 录制语音包功能的引导方法、装置、设备和计算机存储介质 - Google Patents

录制语音包功能的引导方法、装置、设备和计算机存储介质 Download PDF

Info

Publication number
WO2021098175A1
WO2021098175A1 PCT/CN2020/092155 CN2020092155W WO2021098175A1 WO 2021098175 A1 WO2021098175 A1 WO 2021098175A1 CN 2020092155 W CN2020092155 W CN 2020092155W WO 2021098175 A1 WO2021098175 A1 WO 2021098175A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
voice
map
recording
users
Prior art date
Application number
PCT/CN2020/092155
Other languages
English (en)
French (fr)
Inventor
马文韬
黄际洲
雷锦艺
丁世强
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201911140137.6A external-priority patent/CN112825256B/zh
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to EP20820758.9A priority Critical patent/EP3851803B1/en
Priority to US17/254,814 priority patent/US11976931B2/en
Priority to JP2021515173A priority patent/JP7225380B2/ja
Priority to KR1020217008183A priority patent/KR102440635B1/ko
Publication of WO2021098175A1 publication Critical patent/WO2021098175A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/16Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters 
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/36Input/output arrangements for on-board computers
    • G01C21/3626Details of the output of route guidance instructions
    • G01C21/3629Guidance using speech or audio output, e.g. text-to-speech
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/36Input/output arrangements for on-board computers
    • G01C21/3697Output of additional, non-guidance related information, e.g. low fuel level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor

Definitions

  • This application relates to the field of computer application technology, and in particular to a method, device, device, and computer storage medium for guiding the function of recording voice packets in the field of big data technology.
  • recording personalized voice packets is a brand-new, technologically advanced function, and most users lack awareness of it.
  • traditional methods such as User push promotion information, etc., are all pushed in full and at the same time. All users will frequently receive such promotion information, and the accuracy of delivery is poor, which will cause excessive disturbance to some users.
  • this application is used to reduce the excessive disturbance to users caused by the promotion information.
  • this application provides a method for guiding the recording voice packet function, the method includes:
  • the use of historical map usage behaviors of map users to identify target users with voice packet recording requirements includes:
  • the first classification model obtained through pre-training recognizes the map user based on the feature vector of the map user, and obtains a recognition result of whether the map user has a voice packet recording requirement.
  • using the user's historical map usage behavior to identify target users with voice packet recording requirements includes:
  • the feature vector further includes a basic portrait of the user.
  • the behavior characteristics include: at least one of travel-related behaviors, voice use behaviors, and voice package-related behaviors;
  • the travel-related behavior includes time and location information of at least one of POI retrieval, navigation, and positioning;
  • the voice use behavior includes at least one of the frequency of using the voice function, the time of last use, and the voice function used;
  • the voice package-related behavior includes at least one of the number of times the voice package is used, the type of voice package used, the state of recording the voice package, the time when the voice package was last recorded, and the frequency of accessing the voice package recording page.
  • identifying the scenario where the target user uses the client includes:
  • the scene information of the target user using the client is acquired, and the scene information is recognized through the second classification model obtained by pre-training, and the recognition result of whether the scene information matches the voice packet recording scene is obtained.
  • the scene information includes at least one of the following:
  • the target user uses the client's time information, location information, the time of the last POI retrieval, the time of the last navigation, whether it is located in the resident location, the status of the recorded voice package, the time of the last recorded voice package, and the history of the first time.
  • a response message to the guidance message uses the client's time information, location information, the time of the last POI retrieval, the time of the last navigation, whether it is located in the resident location, the status of the recorded voice package, the time of the last recorded voice package, and the history of the first time.
  • the method further includes:
  • the method further includes:
  • the second guide information is sent to the user.
  • the present application provides a guiding device for recording a voice packet function, the device including:
  • the demand recognition unit is used to use the historical map usage behavior of map users to identify target users with voice package recording requirements
  • a scene recognition unit for recognizing the scene in which the target user uses the client
  • the first guiding unit is configured to send first guiding information of the voice packet recording function to the client if the scenario in which the target user uses the client complies with the voice packet recording scenario.
  • this application provides an electronic device, including:
  • At least one processor At least one processor
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method described in any one of the above.
  • the present application provides a non-transitory computer-readable storage medium storing computer instructions, the computer instructions being used to make the computer execute the method described in any one of the above.
  • the target users who have voice package recording requirements are identified through historical map usage behaviors, and the scenarios in which the target users use the client are identified, and only users who have voice package recording requirements and meet the voice package recording scene are sent to record voice
  • the guide information of the package function can be used to achieve precise guidance and reduce excessive disturbance to users.
  • the user's voice packet recording state is tracked and recorded, and used for subsequent user needs identification and scene identification, so as to achieve continuous user guidance and optimize the identification of user needs and scenes.
  • further guidance information can be sent to the user according to the recording status of the user's voice packet, so as to realize the recording encouragement to the user and the guidance of the recording process.
  • Figure 1 shows an exemplary system architecture to which the embodiments of the present application can be applied
  • Figure 2 is a flowchart of a method provided by an embodiment of the application.
  • FIG. 3 is a schematic diagram of first guidance information provided by an embodiment of this application.
  • Figure 4 is a structural diagram of an apparatus provided by an embodiment of the application.
  • Fig. 5 is a block diagram of an electronic device used to implement an embodiment of the present application.
  • FIG. 1 shows an exemplary system architecture of a guiding method for recording a voice packet function or a guiding device for recording a voice packet function according to an embodiment of the present application.
  • the system architecture may include terminal devices 101 and 102, a network 103 and a server 104.
  • the network 103 is used to provide a medium for communication links between the terminal devices 101 and 102 and the server 104.
  • the network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101 and 102 to interact with the server 104 through the network 103.
  • Various applications may be installed on the terminal devices 101 and 102, such as voice interactive applications, map applications, web browser applications, and communication applications.
  • the terminal devices 101 and 102 may be various electronic devices that support voice recording (that is, capable of collecting voice data entered by a user) and voice broadcasting. Including but not limited to smart phones, tablets, laptops, etc.
  • the guiding device for recording the voice packet function provided by the present application can be set up and run in the server 104 mentioned above. It can be implemented as multiple software or software modules (for example, used to provide distributed services), or can be implemented as a single software or software module, which is not specifically limited here.
  • the server 104 may record the user's historical usage behavior of the map application through the client on the terminal device 101 or 102, and based on this, send the client the guidance information of the voice packet recording function.
  • the server 104 may be a single server or a server group composed of multiple servers. It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the core idea of this application is to use the user's historical map usage behavior to identify users with voice package recording requirements; to identify the scenes of users with voice package recording requirements using the client. If it meets the preset voice package recording scenarios, Then send the guidance information of the recording voice packet function to the client. That is to identify users in terms of needs and scenarios, and only guide users with voice packet recording needs in the voice packet recording scene, thereby reducing the excessive interruption of users by promotion information.
  • the following examples describe in detail the method provided in this application.
  • Fig. 2 is a flowchart of the method provided by an embodiment of the application. As shown in Fig. 2, the method may include the following steps:
  • the historical map usage behavior of the map user is used to identify the target user who has voice package recording requirements.
  • This step is to screen the target users, and the methods used can include but are not limited to the following two:
  • the first method extracting behavior features from the map user's historical map usage behavior to obtain the map user's feature vector; the first classification model obtained by pre-training recognizes the map user based on the map user's feature vector to obtain whether the map user is Recognition results with voice packet recording requirements.
  • the feature vector in the above manner may further include a basic portrait of the user.
  • the user's basic portrait can be age, gender, job, and so on.
  • the behavior features extracted in this application may include, but are not limited to, at least one of travel-related behaviors, voice use behaviors, and voice package-related behaviors.
  • the travel-related behavior may include time and location information of at least one of POI retrieval, navigation, and positioning.
  • Often users who may use recorded voice packages show certain characteristics in their map usage behavior. For example, parents with children may use the voice package recording function to record the child’s voice as a voice package, which will reflect such things as morning and evening. There will be the kindergarten location data, the weekend will navigate to training classes, search for suitable places for walking the baby, etc.
  • college students in school may use their own voice or the voice of a couple to record a voice package, which will reflect that they are located in a certain college most of the time. Therefore, the time and location information of POI retrieval, navigation, and positioning can reflect to a large extent whether the user is the target user of the voice packet recording function.
  • the voice use behavior may include at least one of the frequency of using the voice function, the time of last use, and the voice function used. For example, some users often use the navigation broadcast function, so such users may be potential target users who use the voice packet function. For another example, some users often use maps through voice interaction, so such users may also be potential target users who use the voice package function.
  • Voice package related behaviors include at least one of the number of times the voice package is used, the type of voice package used, the state of recording the voice package, the time when the voice package was last recorded, and the frequency of accessing the voice package recording page. For example, if some users are happy to use various voice packages and have a rich history of voice package use, then the user is likely to be happy to record voice packages by themselves. For another example, some users have used the voice packet recording function, and have not recorded a new voice packet for a long time, then the user may be guided to perform a new voice packet recording. For another example, some users did not complete the recording when recording the voice package last time, so it is very likely that the recording of the voice package will be completed under the guidance. and many more.
  • the feature vector of the map user is input into a pre-trained classification model, and the classification model outputs a classification result based on the feature vector, that is, whether the map user has a voice packet recording requirement.
  • positive and negative samples can be determined in advance through offline visits and telephone return visits; or the first guidance information can be sent to the user in a small area online to see whether the user responds to the first guidance information to determine the positive and negative samples.
  • Negative sample extract the behavior features from the user history map usage behavior of the positive sample to obtain the feature vector of the positive sample user, and extract the behavior feature from the user history map usage behavior of the negative sample to obtain the feature vector of the negative sample user, thereby training the classification model.
  • the method for determining the feature vector of the positive and negative sample users is the same as the method for determining the feature vector of the map user described above, and will not be repeated here.
  • the classification model can adopt SVM (Support Vector Machine, support vector machine), LR (Logistic Regression, logistic regression model) and so on.
  • the second method extract behavior features from seed users in advance to obtain the feature vector of the seed user, and cluster the seed users based on the feature vector of the seed user to obtain each user cluster; extract the behavior from the historical map usage behavior of the map user Feature to obtain the feature vector of the map user; determine whether to cluster the map user into an existing user cluster based on the feature vector of the map user; identify the map user clustered into the existing user cluster as a target user with voice package recording requirements.
  • a batch of users who have used the voice packet recording function can be pre-determined as seed users, and feature vectors can be extracted from these seed users.
  • the method of extracting the feature vector is the same as that described above, and will not be repeated.
  • the user clusters are obtained. These user clusters actually represent some typical user categories, and these user categories are likely to use the voice packet recording function.
  • the feature vector of the category can be calculated uniformly, and then the map user can be clustered into existing user clusters based on the feature vector of the map user. If the existing user cluster can be clustered, it means that the map user belongs to these For typical user categories, there is a high probability that the voice packet recording function will also be used. If the map user cannot be clustered into an existing user cluster, it means that the map user does not belong to these more typical user categories, and the probability of using the voice packet recording function is low.
  • This application does not limit the clustering method used for the above clustering, and may be a clustering method such as K-means (k-means clustering algorithm), EM (Expectation Maximization Algorithm, expectation maximization).
  • the voice packet recording has two characteristics: the environment is more demanding, and the environment with too high noise cannot be recorded; the recording time is longer, and the user needs to be relatively idle.
  • the existing push method for all users at the same time is not correct, and it is easy to excessively disturb users.
  • a scene recognition mechanism is introduced in the embodiment of the present application.
  • a simple scene recognition method can be used to determine whether the current time and location of the target user using the client belongs to a preset voice packet recording scene. For example, check whether the current time is after 8 o'clock in the evening or on weekends, and locate the user at home.
  • this application provides a preferred way to perform scene recognition.
  • the scene information of the target user using the client is acquired, and the scene information is recognized through the second classification model obtained by pre-training, and the recognition result of whether the scene information matches the voice packet recording scene is obtained.
  • the acquired scene information may include one or any combination of the following:
  • the target user uses the client's time information, location information, the time of the last POI retrieval, the time of the last navigation, whether it is located in the resident location, the status of the recorded voice package, the time of the last recorded voice package, and the first guide to history Information response information, etc.
  • the first guidance information can be sent to the user online in a small area to see whether the user responds to the first guidance information to determine the positive and negative samples. Then the scene information of the positive sample users is obtained, and the scene information of the negative sample users trains the classification model.
  • the method of acquiring the scene information of the users of the positive and negative samples is consistent with the method of acquiring the scene information of the target user described above.
  • the first guide information of the voice packet recording function is sent to the client.
  • the first guidance information may use one or any combination of text, pictures, page components, links, etc.
  • the user can conveniently enter the voice packet recording page for voice recording.
  • a combination of components, text, pictures, and links can be displayed on the interface of the client to display the first guide information.
  • the user clicks on "click to record” it jumps to the page where the voice package is recorded.
  • the tracking and recording of the recording state of the user's voice packet can be used to identify the needs and scenes of the user in the future, that is, to update the behavior characteristics and scene characteristics of the user; on the other hand, it can be used to execute step 205.
  • the second guide information is sent to the user according to the recording state of the user's voice packet.
  • different copywriting can be pre-configured according to different recording states, and according to the user's voice packet recording state, the text information and voice information of the corresponding copywriting can be sent to the user as the second guide information.
  • the voice package can be downloaded in 15 minutes" and so on.
  • guidance information can give users encouragement and help children such as children to complete a recording.
  • users who are not familiar with the recording process can also know what to do next.
  • Fig. 4 is a structural diagram of the device provided by an embodiment of the application. As shown in Fig. 4, the device may include: a demand recognition unit 01, a scene recognition unit 02, and a first guiding unit 03, and may further include a recording and tracking unit 04 and The second guide unit 05.
  • the main functions of each component are as follows:
  • the demand recognition unit 01 is used to use the historical map usage behavior of map users to identify target users who have voice packet recording requirements.
  • the demand identification unit 01 can adopt but not limited to the following two methods:
  • the first method extracting behavior features from the map user's historical map usage behavior to obtain the map user's feature vector; the first classification model obtained by pre-training recognizes the map user based on the map user's feature vector to obtain whether the map user is Recognition results with voice packet recording requirements.
  • the second method extract behavior features from seed users in advance to obtain the feature vector of the seed user, and cluster the seed users based on the feature vector of the seed user to obtain each user cluster; extract the behavior from the historical map usage behavior of the map user Feature to obtain the feature vector of the map user; determine whether to cluster the map user into an existing user cluster based on the feature vector of the map user; identify the map user clustered into the existing user cluster as a target user with voice package recording requirements.
  • the aforementioned behavior characteristics may include at least one of travel-related behaviors, voice use behaviors, and voice package-related behaviors.
  • the travel-related behaviors include time and location information of at least one of POI retrieval, navigation, and positioning.
  • the voice use behavior includes at least one of the frequency of using the voice function, the time of last use, and the voice function used.
  • Voice package related behaviors include at least one of the number of times the voice package is used, the type of voice package used, the state of recording the voice package, the time when the voice package was last recorded, and the frequency of accessing the voice package recording page.
  • the feature vector in the above manner may further include a basic portrait of the user.
  • the user's basic portrait can be age, gender, job, and so on. After extracting the behavior features and user basic portraits for the map users, they are respectively coded or mapped to obtain their corresponding feature vectors, and then further spliced to obtain the feature vectors of the map users.
  • the device may also include a first model training unit (not shown in the figure) for obtaining training samples, which can be determined in advance through offline visits or telephone return visits. Positive and negative samples; or send the first guidance information to the user in a small area online to see whether the user responds to the first guidance information to determine the positive and negative samples. Then extract the behavior features from the user history map usage behavior of the positive sample to obtain the feature vector of the positive sample user, and extract the behavior feature from the user history map usage behavior of the negative sample to obtain the feature vector of the negative sample user, thereby training the classification model to get the first Classification model.
  • a first model training unit for obtaining training samples, which can be determined in advance through offline visits or telephone return visits. Positive and negative samples; or send the first guidance information to the user in a small area online to see whether the user responds to the first guidance information to determine the positive and negative samples. Then extract the behavior features from the user history map usage behavior of the positive sample to obtain the feature vector of the positive sample user, and extract the behavior feature from the user history map usage
  • the scene recognition unit 02 is used for recognizing the scene in which the target user uses the client.
  • the scene recognition unit 02 can obtain the scene information of the target user using the client, and recognize the scene information through the second classification model obtained by pre-training, and obtain the recognition result of whether the scene information matches the voice packet recording scene.
  • the above-mentioned scene information may include at least one of the following: time information of the target user using the client, location information, the time of the last POI retrieval, the time of the last navigation, whether it is located in the resident location, the status of the recorded voice package, the most recent recording The time of the voice packet and the response information to the historical first guide information.
  • this application may also include a second model training unit (not shown in the figure) for obtaining training samples.
  • the first guidance information can be sent to the user online within a small range to see if the user responds to the first guidance information. Determine the positive and negative samples. Then, the scene information of the user with the positive sample is obtained, and the scene information of the user with the negative sample is trained on the classification model to obtain the second classification model.
  • the first guiding unit 03 is configured to send the first guiding information of the voice packet recording function to the client if the scene in which the target user uses the client meets the voice packet recording scenario.
  • the first guidance information may use one or any combination of text, pictures, page components, links, etc. Through the first guide information, the user can conveniently enter the voice packet recording page for voice recording.
  • the recording and tracking unit 04 is used to track and record the recording state of the user's voice packet after acquiring the event of the user's recording of the voice packet.
  • the second guiding unit 05 is configured to send second guiding information to the user according to the recording state of the user's voice packet.
  • different copywriting can be pre-configured according to different recording states, and according to the user's voice packet recording state, the text information and voice information of the corresponding copywriting can be sent to the user as the second guide information.
  • the present application also provides an electronic device and a readable storage medium.
  • FIG. 5 it is a block diagram of an electronic device according to the method of the embodiment of the present application.
  • Electronic devices are intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the application described and/or required herein.
  • the electronic device includes: one or more processors 501, a memory 502, and interfaces for connecting various components, including a high-speed interface and a low-speed interface.
  • the various components are connected to each other using different buses, and can be installed on a common motherboard or installed in other ways as needed.
  • the processor may process instructions executed in the electronic device, including instructions stored in or on the memory to display graphical information of the GUI on an external input/output device (such as a display device coupled to an interface).
  • an external input/output device such as a display device coupled to an interface.
  • multiple processors and/or multiple buses can be used with multiple memories and multiple memories.
  • multiple electronic devices can be connected, and each device provides part of the necessary operations (for example, as a server array, a group of blade servers, or a multi-processor system).
  • a processor 501 is taken as an example.
  • the memory 502 is a non-transitory computer-readable storage medium provided by this application. Wherein, the memory stores instructions executable by at least one processor, so that the at least one processor executes the method provided in this application.
  • the non-transitory computer-readable storage medium of this application stores computer instructions, which are used to make a computer execute the method provided in this application.
  • the memory 502 can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/units corresponding to the methods in the embodiments of the present application.
  • the processor 501 executes various functional applications and data processing of the server by running non-transitory software programs, instructions, and modules stored in the memory 502, that is, implements the methods in the foregoing method embodiments.
  • the memory 502 may include a program storage area and a data storage area.
  • the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the electronic device, and the like.
  • the memory 502 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • the memory 502 may optionally include a memory remotely provided with respect to the processor 501, and these remote memories may be connected to the electronic device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the electronic device may further include: an input device 503 and an output device 504.
  • the processor 501, the memory 502, the input device 503, and the output device 504 may be connected by a bus or in other ways. In FIG. 5, the connection by a bus is taken as an example.
  • the input device 503 can receive input digital or character information, and generate key signal input related to the user settings and function control of the electronic device, such as touch screen, keypad, mouse, track pad, touch pad, indicator stick, one or more Input devices such as mouse buttons, trackballs, joysticks, etc.
  • the output device 504 may include a display device, an auxiliary lighting device (for example, LED), a tactile feedback device (for example, a vibration motor), and the like.
  • the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some embodiments, the display device may be a touch screen.
  • Various implementations of the systems and techniques described herein can be implemented in digital electronic circuit systems, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: being implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, the programmable processor It can be a dedicated or general-purpose programmable processor that can receive data and instructions from the storage system, at least one input device, and at least one output device, and transmit the data and instructions to the storage system, the at least one input device, and the at least one output device. An output device.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, device, and/or device used to provide machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memory, programmable logic devices (PLD)), including machine-readable media that receive machine instructions as machine-readable signals.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described here can be implemented on a computer that has: a display device for displaying information to the user (for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) ); and a keyboard and a pointing device (for example, a mouse or a trackball) through which the user can provide input to the computer.
  • a display device for displaying information to the user
  • LCD liquid crystal display
  • keyboard and a pointing device for example, a mouse or a trackball
  • Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and technologies described herein can be implemented in a computing system that includes back-end components (for example, as a data server), or a computing system that includes middleware components (for example, an application server), or a computing system that includes front-end components (for example, A user computer with a graphical user interface or a web browser, through which the user can interact with the implementation of the system and technology described herein), or includes such back-end components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be connected to each other through any form or medium of digital data communication (for example, a communication network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • the computer system can include clients and servers.
  • the client and server are generally far away from each other and usually interact through a communication network.
  • the relationship between the client and the server is generated by computer programs running on the corresponding computers and having a client-server relationship with each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Probability & Statistics with Applications (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Navigation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种录制语音包功能的引导方法、装置、设备和计算机存储介质,涉及大数据领域。具体方法为:利用地图用户的历史地图使用行为,识别出具有语音包录制需求的目标用户;对所述目标用户使用客户端的场景进行识别,若符合语音包录制场景,则向所述客户端发送录制语音包功能的第一引导信息。该方法实现了录制语音包功能的精准引导,降低对用户造成的过度打扰。

Description

录制语音包功能的引导方法、装置、设备和计算机存储介质
本申请要求了申请日为2019年11月20日,申请号为201911140137.6发明名称为“录制语音包功能的引导方法、装置、设备和计算机存储介质”的中国专利申请的优先权。
技术领域
本申请涉及计算机应用技术领域,特别涉及大数据技术领域中的一种录制语音包功能的引导方法、装置、设备和计算机存储介质。
背景技术
本部分旨在为权利要求书中陈述的本发明的实施方式提供背景或上下文。此处的描述不因为包括在本部分中就被认为是现有技术。
随着计算机技术的不断发展和人们对于产品要求的不断提高,出现了利用语音合成技术在计算机应用产品中提供不同发音人的语音播报功能。例如,在地图类产品中,进行导航语音播报时,用户可以选择地图类产品提供的不同发音人进行导航语音播报。这些发音人往往是诸如影视演员、相声演员、歌手等知名人士。但随着用户对个性化需求的日渐提高,极少数地图类应用出现了向用户提供个性化语音包的录制功能。用户能够将自己、家人或朋友等的声音录制为语音包,在进行语音播报时产生的就是自己、家人或朋友等的声音。
但录制个性化语音包作为一个全新的、站在科技前沿的功能,大部分用户都对其缺乏认知,在面临向广大用户进行该功能的推广时,传统的诸如作为开屏信息推广、向用户推送推广信息等,均是全量且同时间推送,所有用户均会频繁收到此类推广信息,投放精准度差,对一些用 户会造成过度打扰。
发明内容
有鉴于此,本申请用以降低推广信息对用户造成的过度打扰。
第一方面,本申请提供了一种录制语音包功能的引导方法,该方法包括:
利用地图用户的历史地图使用行为,识别出具有语音包录制需求的目标用户;
对所述目标用户使用客户端的场景进行识别,若符合语音包录制场景,则向所述客户端发送录制语音包功能的第一引导信息。
根据本申请一优选实施方式,利用地图用户的历史地图使用行为,识别出具有语音包录制需求的目标用户包括:
从地图用户的历史地图使用行为中提取行为特征以得到地图用户的特征向量;
通过预先训练得到的第一分类模型基于所述地图用户的特征向量对地图用户进行识别,得到所述地图用户是否具有语音包录制需求的识别结果。
根据本申请一优选实施方式,利用用户的历史地图使用行为,识别出具有语音包录制需求的目标用户包括:
预先从种子用户中提取行为特征以得到种子用户的特征向量,并基于种子用户的特征向量对种子用户进行聚类得到各用户簇;
从地图用户的历史地图使用行为中提取行为特征以得到地图用户的特征向量;
基于所述地图用户的特征向量确定是否将地图用户聚类至已有用户 簇;
将聚类至已有用户簇的地图用户识别为具有语音包录制需求的目标用户。
根据本申请一优选实施方式,所述特征向量进一步包括用户基础画像。
根据本申请一优选实施方式,所述行为特征包括:出行相关行为、语音使用行为和语音包相关行为中的至少一种;
其中所述出行相关行为包括POI检索、导航和定位中至少一种的时间和位置信息;
所述语音使用行为包括使用语音功能的频率、最近一次使用时间以及使用的语音功能中的至少一种;
所述语音包相关行为包括使用语音包的次数、使用语音包的类型、录制语音包的状态、最近一次录制语音包的时间以及访问语音包录制页面的频率中的至少一种。
根据本申请一优选实施方式,对所述目标用户使用客户端的场景进行识别包括:
获取所述目标用户使用客户端的场景信息,通过预先训练得到的第二分类模型对所述场景信息进行识别,得到所述场景信息是否符合语音包录制场景的识别结果。
根据本申请一优选实施方式,所述场景信息包括以下至少一种:
所述目标用户使用客户端的时间信息、位置信息、最近一次检索POI的时间、最近一次导航的时间、是否定位在常驻位置、录制语音包的状态、最近一次录制语音包的时间以及对历史第一引导信息的响应信息。
根据本申请一优选实施方式,该方法还包括:
获取到用户录制语音包的事件后,跟踪并记录用户的语音包录制状态。
根据本申请一优选实施方式,该方法还包括:
依据所述用户的语音包录制状态,向所述用户发送第二引导信息。
第二方面,本申请提供了一种录制语音包功能的引导装置,该装置包括:
需求识别单元,用于利用地图用户的历史地图使用行为,识别出具有语音包录制需求的目标用户;
场景识别单元,用于对所述目标用户使用客户端的场景进行识别;
第一引导单元,用于若所述目标用户使用客户端的场景符合语音包录制场景,则向所述客户端发送录制语音包功能的第一引导信息。
第三方面,本申请提供了一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上任一项所述的方法。
第四方面,本申请提供了一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行如上任一项所述的方法。
由以上技术方案可以看出,本申请提供的方法、装置、设备和计算机存储介质可以具备以下优点:
1)本申请中通过历史地图使用行为识别出具有语音包录制需求的目标用户,并对目标用户使用客户端的场景进行识别,仅对具有语音包录制需求且符合语音包录制场景的用户发送录制语音包功能的引导信息,从而实现精准引导,降低对用户造成的过度打扰。
2)本申请在客户端内实现的精准引导,相比较传统请知名人士代言或者地推的方式,大大降低了成本,提高了转化率。
3)本申请中对用户的语音包录制状态进行跟踪和记录,并用于后续用户的需求识别和场景识别,从而实现用户的持续化引导,优化对用户需求和场景的识别。
4)本申请中能够依据用户语音包录制状态向用户进一步发送引导信息,从而实现对用户的录制鼓励和录制流程的引导。
上述可选方式所具有的其他效果将在下文中结合具体实施例加以说明。
附图说明
附图用于更好地理解本方案,不构成对本申请的限定。其中:
图1示出了可以应用本申请实施例的示例性系统架构;
图2为本申请实施例提供的方法流程图;
图3为本申请实施例提供的第一引导信息的示意图;
图4为本申请实施例提供的装置结构图;
图5是用来实现本申请实施例的电子设备的框图。
具体实施方式
以下结合附图对本申请的示范性实施例做出说明,其中包括本申请实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此, 本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本申请的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
图1示出了可以应用本申请实施例的用于录制语音包功能的引导方法或用于录制语音包功能的引导装置的示例性系统架构。
如图1所示,该系统架构可以包括终端设备101和102,网络103和服务器104。网络103用以在终端设备101、102和服务器104之间提供通信链路的介质。网络103可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101和102通过网络103与服务器104交互。终端设备101和102上可以安装有各种应用,例如语音交互应用、地图类应用、网页浏览器应用、通信类应用等。
终端设备101和102可以是支持语音录入(即能够采集用户录入的语音数据)和语音播报的各种电子设备。包括但不限于智能手机、平板电脑、笔记本电脑等等。本申请所提供的录制语音包功能的引导装置可以设置并运行于上述服务器104中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块,在此不做具体限定。服务器104可以记录用户通过终端设备101或102上的客户端对地图应用的历史使用行为,并基于此向客户端发送录制语音包功能的引导信息。
服务器104可以是单一服务器,也可以是多个服务器构成的服务器群组。应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
本申请的核心思想在于,利用用户的历史地图使用行为,识别出具有语音包录制需求的用户;对具有语音包录制需求的用户使用客户端的场景进行识别,若符合预设的语音包录制场景,则向客户端发送录制语音包功能的引导信息。即从需求上和场景上对用户进行识别,仅在语音包录制场景对具有语音包录制需求的用户进行引导,从而降低推广信息对用户的多度打扰。下面实施例对本申请提供的方法进行详细描述。
图2为本申请实施例提供的方法流程图,如图2中所示,该方法可以包括以下步骤:
在201中,利用地图用户的历史地图使用行为,识别出具有语音包录制需求的目标用户。
对于一款应用产品,特别是受欢迎的应用而言,通常用户数量是非常庞大的。若像现有技术一样向所有用户均进行录制语音包功能的推广,则必然对没有这方面需求用户来说是一种打扰。
本步骤是对目标用户的筛选,采用的方式可以包括但不限于以下两种:
第一种方式:从地图用户的历史地图使用行为中提取行为特征以得到地图用户的特征向量;通过预先训练得到的第一分类模型基于地图用户的特征向量对地图用户进行识别,得到地图用户是否具有语音包录制需求的识别结果。
上述方式中特征向量除了包括行为特征之外,还可以进一步包括用户基础画像。其中用户基础画像可以是年龄、性别、工作等等。
下面重点对行为特征进行介绍。本申请中提取的行为特征可以包括但不限于出行相关行为、语音使用行为和语音包相关行为中的至少一种。
其中出行相关行为可以包括POI检索、导航和定位中至少一种的时间和位置信息。往往可能会使用录制语音包的用户在地图使用行为中是呈现出某一些特征的,例如有孩子的父母可能会使用录制语音包功能来将孩子的声音录制为语音包,其会体现出诸如早晚会有幼儿园的定位数据,周末会导航到培训班、检索适合遛娃的场所等。再例如在校的大学生可能会使用自己的声音或情侣的声音录制语音包,其会体现出大多数时间定位在某高校。因此POI检索、导航和定位的时间和位置信息能够很大程度上反映出用户是否是录制语音包功能的目标用户。
语音使用行为可以包括使用语音功能的频率、最近一次使用时间以及使用的语音功能中的至少一种。例如有的用户经常使用导航播报功能,那么这类用户就可能是潜在会使用语音包功能的目标用户。再例如有的用户经常通过语音交互方式使用地图,那么这类用户也可能是潜在会使用语音包功能的目标用户。
语音包相关行为包括使用语音包的次数、使用语音包的类型、录制语音包的状态、最近一次录制语音包的时间以及访问语音包录制页面的频率中的至少一种。例如,有些用户乐于使用各种语音包,有丰富的语音包使用历史,那么该用户也很可能会乐于自己录制语音包。再例如,有些用户使用过语音包录制功能,并且已经很长时间没有录制过新的语音包了,那么该用户可能会在引导下进行新的语音包录制。再例如,有些用户上一次录制语音包时并未录制完成,那么很可能在引导下完成语音包的录制。等等。
在针对地图用户提取出行为特征和用户基础画像后,分别进行编码或映射得到各自对应的特征向量,再进一步拼接得到地图用户的特征向 量。
将地图用户的特征向量输入预先训练得到的分类模型,分类模型输出基于该特征向量的分类结果,即该地图用户是否具有语音包录制需求。
分类模型在训练时,可以预先通过线下走访、电话回访的方式确定出正、负样本;或者小范围内线上发送第一引导信息给用户,看用户是否响应第一引导信息来确定出正、负样本。然后从正样本的用户历史地图使用行为中提取行为特征得到正样本用户的特征向量,从负样本的用户历史地图使用行为中提取行为特征得到负样本用户的特征向量,从而训练分类模型。正、负样本用户的特征向量的确定方式与上述地图用户的特征向量确定方式一致,在此不做赘述。其中分类模型可以采用诸如SVM(Support Vector Machine,支持向量机)、LR(Logistic Regression,逻辑回归模型)等等。
第二种方式:预先从种子用户中提取行为特征以得到种子用户的特征向量,并基于种子用户的特征向量对种子用户进行聚类得到各用户簇;从地图用户的历史地图使用行为中提取行为特征以得到地图用户的特征向量;基于地图用户的特征向量确定是否将地图用户聚类至已有用户簇;将聚类至已有用户簇的地图用户识别为具有语音包录制需求的目标用户。
可以预先确定一批使用过语音包录制功能的用户作为种子用户,从这些种子用户中提取特征向量。特征向量的提取方式与上面所述的方式相同,不做赘述。基于种子用户的特征向量对种子用户进行聚类后得到各用户簇,这些用户簇实际上代表了一些比较典型的用户类别,这些用户类别大概率会使用语音包录制功能。得到各用户簇后可以统一计算该 类别的特征向量,再基于地图用户的特征向量来将地图用户聚类至已有用户簇,如果能聚类到已有用户簇,则代表该地图用户属于这些比较典型的用户类别,大概率也会使用语音包录制功能。若地图用户不能够聚类到已有用户簇,则代表该地图用户不属于这些比较典型的用户类别,会使用语音包录制功能的概率较低。
本申请对上述聚类所采用的聚类方法并不加以限制,可以是诸如K-means(k均值聚类算法)、EM(Expectation Maximization Algorithm,期望最大化)等聚类方法。
在202中,对目标用户使用客户端的场景进行识别。
由于语音包录制有两个特点:对于环境要求比较苛刻,噪声太高的环境无法录制;录制时间较长,需要用户在相对空闲的时间。但由于用户的行为规律不同,并非所有用户都适合在同一场景下引导,因此现有针对所有用户同样时间的推送方式并不正确,容易对用户进行过度打扰。有鉴于此,在本申请实施例中引入了场景识别的机制。
在本步骤中可以采用简单的场景识别方式,判断目标用户使用客户端的当前时间和位置是否属于预设的语音包录制场景。例如看当前时间是否为晚上八点以后或者周末,并且定位用户在家。
但除了上述简单的以时间和位置作为规则进行判断之外,考虑到用户行为习惯的复杂性,对于场景信息的使用需要更加丰富和深入。因此本申请中提供一种优选的方式来进行场景识别。
获取目标用户使用客户端的场景信息,通过预先训练得到的第二分类模型对场景信息进行识别,得到场景信息是否符合语音包录制场景的识别结果。其中,获取的场景信息可以包括以下一种或任意组合:
目标用户使用客户端的时间信息、位置信息、最近一次检索POI的时间、最近一次导航的时间、是否定位在常驻位置、录制语音包的状态、最近一次录制语音包的时间以及对历史第一引导信息的响应信息,等等。
在训练第二分类模型时,可以小范围内线上发送第一引导信息给用户,看用户是否响应第一引导信息来确定出正、负样本。然后获取正样本的用户的场景信息,负样本用户的场景信息对分类模型进行训练。其中正、负样本的用户的场景信息获取方式与上述目标用户的场景信息获取方式一致。
另外,需要说明的是,本实施例中涉及的“第一分类模型”、“第二分类模型”、“第一引导信息”、“第二引导信息”等中的“第一”和“第二”并不具有数量、顺序等含义,而是仅仅在名称上进行区分。
在203中,若识别出符合语音包录制场景,则向客户端发送录制语音包功能的第一引导信息。
第一引导信息可以采用文本、图片、页面组件、链接等中的一种或任意组合。用户通过该第一引导信息可以方便地进入录制语音包的页面进行语音录制。例如图3中所示,可以在客户端的界面上显示组件与文本、图片、链接的结合来展示第一引导信息。当用户点击其中的“点击录制”后跳转至录制语音包的页面。
在204中,获取到用户录制语音包的事件后,跟踪并记录用户的语音包录制状态。
对用户的语音包录制状态的跟踪和记录一方面可以用于以后对该用户的需求和场景识别,即更新该用户的行为特征和场景特征;另一方面可以用于执行步骤205。
在205中,依据用户的语音包录制状态,向用户发送第二引导信息。
在本申请中,可以依据不同的录制状态预先配置不同的文案,依据用户的语音包录制状态,向用户发送对应文案的文本信息、语音信息等作为第二引导信息。
例如,如果用户还差5句录制完成,可以向用户发送“胜利在望,加油哦”的语音。如果用户完成录制,则可以向用户发送“很棒哦,完成录制,15分钟后可下载语音包”等等。这样的引导信息一方面能够给予用户以鼓励,帮助诸如小朋友等一次录制完成。另一方面也可以让不熟悉录制流程的用户能够获知下一步该怎么做。
以上是对本申请所提供方法进行的详细描述。下面结合实施例对本申请提供的装置进行详细描述。
图4为本申请实施例提供的装置结构图,如图4中所示,该装置可以包括:需求识别单元01、场景识别单元02和第一引导单元03,还可以进一步包括录制跟踪单元04和第二引导单元05。其中各组成单元的主要功能如下:
需求识别单元01,用于利用地图用户的历史地图使用行为,识别出具有语音包录制需求的目标用户。
其中需求识别单元01可以采用但不限于以下两种方式:
第一种方式:从地图用户的历史地图使用行为中提取行为特征以得到地图用户的特征向量;通过预先训练得到的第一分类模型基于地图用户的特征向量对地图用户进行识别,得到地图用户是否具有语音包录制需求的识别结果。
第二种方式:预先从种子用户中提取行为特征以得到种子用户的特 征向量,并基于种子用户的特征向量对种子用户进行聚类得到各用户簇;从地图用户的历史地图使用行为中提取行为特征以得到地图用户的特征向量;基于地图用户的特征向量确定是否将地图用户聚类至已有用户簇;将聚类至已有用户簇的地图用户识别为具有语音包录制需求的目标用户。
上述行为特征可以包括:出行相关行为、语音使用行为和语音包相关行为中的至少一种。
其中出行相关行为包括POI检索、导航和定位中至少一种的时间和位置信息。语音使用行为包括使用语音功能的频率、最近一次使用时间以及使用的语音功能中的至少一种。语音包相关行为包括使用语音包的次数、使用语音包的类型、录制语音包的状态、最近一次录制语音包的时间以及访问语音包录制页面的频率中的至少一种。
上述方式中特征向量除了包括行为特征之外,还可以进一步包括用户基础画像。其中用户基础画像可以是年龄、性别、工作等等。在针对地图用户提取出行为特征和用户基础画像后,分别进行编码或映射得到各自对应的特征向量,再进一步拼接得到地图用户的特征向量。
另外,对应于上述第一种方式,该装置还可以包括第一模型训练单元(图中未示出),用于获取训练样本,该训练样本可以预先通过线下走访、电话回访的方式确定出正、负样本;或者小范围内线上发送第一引导信息给用户,看用户是否响应第一引导信息来确定出正、负样本。然后从正样本的用户历史地图使用行为中提取行为特征得到正样本用户的特征向量,从负样本的用户历史地图使用行为中提取行为特征得到负样本用户的特征向量,从而训练分类模型得到第一分类模型。
场景识别单元02,用于对目标用户使用客户端的场景进行识别。
具体地,场景识别单元02可以获取目标用户使用客户端的场景信息,通过预先训练得到的第二分类模型对场景信息进行识别,得到场景信息是否符合语音包录制场景的识别结果。
上述场景信息可以包括以下至少一种:目标用户使用客户端的时间信息、位置信息、最近一次检索POI的时间、最近一次导航的时间、是否定位在常驻位置、录制语音包的状态、最近一次录制语音包的时间以及对历史第一引导信息的响应信息。
相应地,本申请还可以包括第二模型训练单元(图中未示出),用于获取训练样本,例如可以小范围内线上发送第一引导信息给用户,看用户是否响应第一引导信息来确定出正、负样本。然后获取正样本的用户的场景信息,负样本用户的场景信息对分类模型进行训练得到第二分类模型。
第一引导单元03,用于若目标用户使用客户端的场景符合语音包录制场景,则向客户端发送录制语音包功能的第一引导信息。
第一引导信息可以采用文本、图片、页面组件、链接等中的一种或任意组合。用户通过该第一引导信息可以方便地进入录制语音包的页面进行语音录制。
录制跟踪单元04,用于获取到用户录制语音包的事件后,跟踪并记录用户的语音包录制状态。
第二引导单元05,用于依据用户的语音包录制状态,向用户发送第二引导信息。在本申请中,可以依据不同的录制状态预先配置不同的文案,依据用户的语音包录制状态,向用户发送对应文案的文本信息、语 音信息等作为第二引导信息。
根据本申请的实施例,本申请还提供了一种电子设备和一种可读存储介质。
如图5所示,是根据本申请实施例方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。
如图5所示,该电子设备包括:一个或多个处理器501、存储器502,以及用于连接各部件的接口,包括高速接口和低速接口。各个部件利用不同的总线互相连接,并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理,包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如,耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中,若需要,可以将多个处理器和/或多条总线与多个存储器和多个存储器一起使用。同样,可以连接多个电子设备,各个设备提供部分必要的操作(例如,作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图5中以一个处理器501为例。
存储器502即为本申请所提供的非瞬时计算机可读存储介质。其中,所述存储器存储有可由至少一个处理器执行的指令,以使所述至少一个处理器执行本申请所提供的方法。本申请的非瞬时计算机可读存储介质 存储计算机指令,该计算机指令用于使计算机执行本申请所提供的方法。
存储器502作为一种非瞬时计算机可读存储介质,可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块,如本申请实施例中的方法对应的程序指令/单元。处理器501通过运行存储在存储器502中的非瞬时软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的方法。
存储器502可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据电子设备的使用所创建的数据等。此外,存储器502可以包括高速随机存取存储器,还可以包括非瞬时存储器,例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中,存储器502可选包括相对于处理器501远程设置的存储器,这些远程存储器可以通过网络连接至电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
电子设备还可以包括:输入装置503和输出装置504。处理器501、存储器502、输入装置503和输出装置504可以通过总线或者其他方式连接,图5中以通过总线连接为例。
输入装置503可接收输入的数字或字符信息,以及产生与电子设备的用户设置以及功能控制有关的键信号输入,例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置504可以包括显示设备、辅助照明装置(例如,LED)和触觉反馈装置(例如,振动电机)等。该显示设备可以包括但不限于,液晶显示器(LCD)、发光二极管(LED)显示器和等离子体 显示器。在一些实施方式中,显示设备可以是触摸屏。
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令,并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反 馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本申请保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请保护范围之内。

Claims (18)

  1. 一种录制语音包功能的引导方法,其特征在于,该方法包括:
    利用地图用户的历史地图使用行为,识别出具有语音包录制需求的目标用户;
    对所述目标用户使用客户端的场景进行识别,若符合语音包录制场景,则向所述客户端发送录制语音包功能的第一引导信息。
  2. 根据权利要求1所述的方法,其特征在于,利用地图用户的历史地图使用行为,识别出具有语音包录制需求的目标用户包括:
    从地图用户的历史地图使用行为中提取行为特征以得到地图用户的特征向量;
    通过预先训练得到的第一分类模型基于所述地图用户的特征向量对地图用户进行识别,得到所述地图用户是否具有语音包录制需求的识别结果。
  3. 根据权利要求1所述的方法,其特征在于,利用用户的历史地图使用行为,识别出具有语音包录制需求的目标用户包括:
    预先从种子用户中提取行为特征以得到种子用户的特征向量,并基于种子用户的特征向量对种子用户进行聚类得到各用户簇;
    从地图用户的历史地图使用行为中提取行为特征以得到地图用户的特征向量;
    基于所述地图用户的特征向量确定是否将地图用户聚类至已有用户簇;
    将聚类至已有用户簇的地图用户识别为具有语音包录制需求的目标用户。
  4. 根据权利要求2或3所述的方法,其特征在于,所述特征向量进一步包括用户基础画像。
  5. 根据权利要求2或3所述的方法,其特征在于,所述行为特征包括:出行相关行为、语音使用行为和语音包相关行为中的至少一种;
    其中所述出行相关行为包括POI检索、导航和定位中至少一种的时间和位置信息;
    所述语音使用行为包括使用语音功能的频率、最近一次使用时间以及使用的语音功能中的至少一种;
    所述语音包相关行为包括使用语音包的次数、使用语音包的类型、录制语音包的状态、最近一次录制语音包的时间以及访问语音包录制页面的频率中的至少一种。
  6. 根据权利要求1所述的方法,其特征在于,对所述目标用户使用客户端的场景进行识别包括:
    获取所述目标用户使用客户端的场景信息,通过预先训练得到的第二分类模型对所述场景信息进行识别,得到所述场景信息是否符合语音包录制场景的识别结果。
  7. 根据权利要求6所述的方法,其特征在于,所述场景信息包括以下至少一种:
    所述目标用户使用客户端的时间信息、位置信息、最近一次检索POI的时间、最近一次导航的时间、是否定位在常驻位置、录制语音包的状态、最近一次录制语音包的时间以及对历史第一引导信息的响应信息。
  8. 根据权利要求1所述的方法,其特征在于,该方法还包括:
    获取到用户录制语音包的事件后,跟踪并记录用户的语音包录制状 态。
  9. 根据权利要求8所述的方法,其特征在于,该方法还包括:
    依据所述用户的语音包录制状态,向所述用户发送第二引导信息。
  10. 一种录制语音包功能的引导装置,其特征在于,该装置包括:
    需求识别单元,用于利用地图用户的历史地图使用行为,识别出具有语音包录制需求的目标用户;
    场景识别单元,用于对所述目标用户使用客户端的场景进行识别;
    第一引导单元,用于若所述目标用户使用客户端的场景符合语音包录制场景,则向所述客户端发送录制语音包功能的第一引导信息。
  11. 根据权利要求10所述的装置,其特征在于,所述需求识别单元,具体用于:
    从地图用户的历史地图使用行为中提取行为特征以得到地图用户的特征向量;
    通过预先训练得到的第一分类模型基于所述地图用户的特征向量对地图用户进行识别,得到所述地图用户是否具有语音包录制需求的识别结果。
  12. 根据权利要求10所述的装置,其特征在于,所述需求识别单元,具体用于:
    预先从种子用户中提取行为特征以得到种子用户的特征向量,并基于种子用户的特征向量对种子用户进行聚类得到各用户簇;
    从地图用户的历史地图使用行为中提取行为特征以得到地图用户的特征向量;
    基于所述地图用户的特征向量确定是否将地图用户聚类至已有用户 簇;
    将聚类至已有用户簇的地图用户识别为具有语音包录制需求的目标用户。
  13. 根据权利要求11或12所述的装置,其特征在于,所述行为特征包括:出行相关行为、语音使用行为和语音包相关行为中的至少一种;
    其中所述出行相关行为包括POI检索、导航和定位中至少一种的时间和位置信息;
    所述语音使用行为包括使用语音功能的频率、最近一次使用时间以及使用的语音功能中的至少一种;
    所述语音包相关行为包括使用语音包的次数、使用语音包的类型、录制语音包的状态、最近一次录制语音包的时间以及访问语音包录制页面的频率中的至少一种。
  14. 根据权利要求10所述的装置,其特征在于,所述场景识别单元,具体用于获取所述目标用户使用客户端的场景信息,通过预先训练得到的第二分类模型对所述场景信息进行识别,得到所述场景信息是否符合语音包录制场景的识别结果;
    所述场景信息包括以下至少一种:
    所述目标用户使用客户端的时间信息、位置信息、最近一次检索POI的时间、最近一次导航的时间、是否定位在常驻位置、录制语音包的状态、最近一次录制语音包的时间以及对历史第一引导信息的响应信息。
  15. 根据权利要求10所述的装置,其特征在于,该装置还包括:
    录制跟踪单元,用于获取到用户录制语音包的事件后,跟踪并记录用户的语音包录制状态。
  16. 根据权利要求15所述的装置,其特征在于,该装置还包括:
    第二引导单元,用于依据所述用户的语音包录制状态,向所述用户发送第二引导信息。
  17. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-9中任一项所述的方法。
  18. 一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使所述计算机执行权利要求1-9中任一项所述的方法。
PCT/CN2020/092155 2019-11-20 2020-05-25 录制语音包功能的引导方法、装置、设备和计算机存储介质 WO2021098175A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP20820758.9A EP3851803B1 (en) 2019-11-20 2020-05-25 Method and apparatus for guiding speech packet recording function, device, and computer storage medium
US17/254,814 US11976931B2 (en) 2019-11-20 2020-05-25 Method and apparatus for guiding voice-packet recording function, device and computer storage medium
JP2021515173A JP7225380B2 (ja) 2019-11-20 2020-05-25 音声パケット記録機能のガイド方法、装置、デバイス、プログラム及びコンピュータ記憶媒体
KR1020217008183A KR102440635B1 (ko) 2019-11-20 2020-05-25 음성 패킷 녹취 기능의 안내 방법, 장치, 기기 및 컴퓨터 저장 매체

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911140137.6 2019-11-20
CN201911140137.6A CN112825256B (zh) 2019-11-20 录制语音包功能的引导方法、装置、设备和计算机存储介质

Publications (1)

Publication Number Publication Date
WO2021098175A1 true WO2021098175A1 (zh) 2021-05-27

Family

ID=74859163

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092155 WO2021098175A1 (zh) 2019-11-20 2020-05-25 录制语音包功能的引导方法、装置、设备和计算机存储介质

Country Status (5)

Country Link
US (1) US11976931B2 (zh)
EP (1) EP3851803B1 (zh)
JP (1) JP7225380B2 (zh)
KR (1) KR102440635B1 (zh)
WO (1) WO2021098175A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093753A (zh) * 2012-12-14 2013-05-08 沈阳美行科技有限公司 一种导航系统用户语音自定义方法
CN103674012A (zh) * 2012-09-21 2014-03-26 高德软件有限公司 语音定制方法及其装置、语音识别方法及其装置
CN105606117A (zh) * 2014-11-18 2016-05-25 深圳市腾讯计算机系统有限公司 导航提示方法及装置
US20170328734A1 (en) * 2016-05-12 2017-11-16 Tata Consultancy Services Limited Systems and methods for generating signature ambient sounds and maps thereof
CN109886719A (zh) * 2018-12-20 2019-06-14 平安科技(深圳)有限公司 基于网格的数据挖掘处理方法、装置和计算机设备

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001289661A (ja) * 2000-04-07 2001-10-19 Alpine Electronics Inc ナビゲーション装置
NL1031202C2 (nl) 2006-02-21 2007-08-22 Tomtom Int Bv Navigatieapparaat en werkwijze voor het ontvangen en afspelen van geluidsmonsters.
JP5181533B2 (ja) 2007-05-21 2013-04-10 トヨタ自動車株式会社 音声対話装置
JP2010078851A (ja) 2008-09-25 2010-04-08 Nissan Motor Co Ltd 音声入力装置及び音声入力方法
KR101070785B1 (ko) 2009-02-09 2011-10-06 현대엠엔소프트 주식회사 사용자 개인 음성/동영상을 이용한 네비게이션 음성/영상 안내 서비스 방법 및 그 서비스를 위한 서버
WO2012068280A1 (en) * 2010-11-16 2012-05-24 Echo-Sense Inc. Remote guidance system
US9230556B2 (en) * 2012-06-05 2016-01-05 Apple Inc. Voice instructions during navigation
KR20140098947A (ko) 2013-01-31 2014-08-11 삼성전자주식회사 광고 제공 시스템, 사용자 단말 및 광고 제공 방법
US20140365068A1 (en) 2013-06-06 2014-12-11 Melvin Burns Personalized Voice User Interface System and Method
KR101655166B1 (ko) 2014-10-16 2016-09-07 현대자동차 주식회사 차량의 디지털 매뉴얼 제공 시스템 및 방법
KR101736109B1 (ko) * 2015-08-20 2017-05-16 현대자동차주식회사 음성인식 장치, 이를 포함하는 차량, 및 그 제어방법
US10007269B1 (en) 2017-06-23 2018-06-26 Uber Technologies, Inc. Collision-avoidance system for autonomous-capable vehicle
CN110049079A (zh) 2018-01-16 2019-07-23 阿里巴巴集团控股有限公司 信息推送及模型训练方法、装置、设备及存储介质
CN110275692A (zh) 2019-05-20 2019-09-24 北京百度网讯科技有限公司 一种语音指令的推荐方法、装置、设备和计算机存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103674012A (zh) * 2012-09-21 2014-03-26 高德软件有限公司 语音定制方法及其装置、语音识别方法及其装置
CN103093753A (zh) * 2012-12-14 2013-05-08 沈阳美行科技有限公司 一种导航系统用户语音自定义方法
CN105606117A (zh) * 2014-11-18 2016-05-25 深圳市腾讯计算机系统有限公司 导航提示方法及装置
US20170328734A1 (en) * 2016-05-12 2017-11-16 Tata Consultancy Services Limited Systems and methods for generating signature ambient sounds and maps thereof
CN109886719A (zh) * 2018-12-20 2019-06-14 平安科技(深圳)有限公司 基于网格的数据挖掘处理方法、装置和计算机设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3851803A4

Also Published As

Publication number Publication date
CN112825256A (zh) 2021-05-21
US11976931B2 (en) 2024-05-07
KR20210065100A (ko) 2021-06-03
US20220276067A1 (en) 2022-09-01
EP3851803B1 (en) 2022-09-28
JP7225380B2 (ja) 2023-02-20
EP3851803A4 (en) 2021-11-03
EP3851803A1 (en) 2021-07-21
JP2022512271A (ja) 2022-02-03
KR102440635B1 (ko) 2022-09-05

Similar Documents

Publication Publication Date Title
US11810576B2 (en) Personalization of experiences with digital assistants in communal settings through voice and query processing
US11416777B2 (en) Utterance quality estimation
US11657094B2 (en) Memory grounded conversational reasoning and question answering for assistant systems
EP3648099B1 (en) Voice recognition method, device, apparatus, and storage medium
CN109564571B (zh) 利用搜索上下文的查询推荐方法及系统
CN111221984A (zh) 多模态内容处理方法、装置、设备及存储介质
CN114600114A (zh) 助理系统的设备上卷积神经网络模型
CN108369580B (zh) 针对屏幕上项目选择的基于语言和域独立模型的方法
CN112313680A (zh) 助理系统中手势输入的自动完成
WO2016004763A1 (zh) 业务推荐方法和具有智能助手的装置
US11449682B2 (en) Adjusting chatbot conversation to user personality and mood
US11556698B2 (en) Augmenting textual explanations with complete discourse trees
JP2021131528A (ja) ユーザ意図認識方法、装置、電子機器、コンピュータ可読記憶媒体及びコンピュータプログラム
WO2014176750A1 (en) Reminder setting method, apparatus and system
WO2021139221A1 (zh) 查询自动补全的方法、装置、设备和计算机存储介质
CN112269867A (zh) 用于推送信息的方法、装置、设备以及存储介质
CN111460296B (zh) 用于更新事件集合的方法和装置
US20210233520A1 (en) Contextual multi-channel speech to text
TW202301080A (zh) 輔助系統的多裝置調解
CN112015866A (zh) 用于生成同义文本的方法、装置、电子设备及存储介质
WO2024036616A1 (zh) 一种基于终端的问答方法及装置
CN110832444B (zh) 用户界面声音发出活动分类
WO2021098175A1 (zh) 录制语音包功能的引导方法、装置、设备和计算机存储介质
US11475221B2 (en) Techniques for selecting content to include in user communications
US11574246B2 (en) Updating training examples for artificial intelligence

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020820758

Country of ref document: EP

Effective date: 20201217

ENP Entry into the national phase

Ref document number: 2021515173

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE