WO2023071768A1 - 到站提醒方法、装置、终端、存储介质及程序产品 - Google Patents

到站提醒方法、装置、终端、存储介质及程序产品 Download PDF

Info

Publication number
WO2023071768A1
WO2023071768A1 PCT/CN2022/124453 CN2022124453W WO2023071768A1 WO 2023071768 A1 WO2023071768 A1 WO 2023071768A1 CN 2022124453 W CN2022124453 W CN 2022124453W WO 2023071768 A1 WO2023071768 A1 WO 2023071768A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
global
inertial sensor
sound
feature
Prior art date
Application number
PCT/CN2022/124453
Other languages
English (en)
French (fr)
Inventor
刘文龙
Original Assignee
上海瑾盛通信科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海瑾盛通信科技有限公司 filed Critical 上海瑾盛通信科技有限公司
Publication of WO2023071768A1 publication Critical patent/WO2023071768A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/3453Special cost functions, i.e. other than distance or default speed limit of road segments
    • G01C21/3461Preferred or disfavoured areas, e.g. dangerous zones, toll or emission zones, intersections, manoeuvre types, segments such as motorways, toll roads, ferries
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/10Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
    • G01C21/12Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
    • G01C21/16Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
    • G01C21/165Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments

Definitions

  • the present disclosure relates to the technical field of terminals, and in particular to an arrival reminder method, device, terminal, storage medium and program product.
  • the terminal can have an arrival reminder function to remind passengers to arrive at the target site. Get off in time.
  • the terminal usually uses an embedded accelerometer to collect acceleration, and determines the acceleration of the current vehicle in real time according to the acceleration value recorded by the accelerometer in real time. For example, if the terminal detects that the acceleration is greater than zero, it judges that the vehicle is in the start-up phase; if it detects that the acceleration is less than zero, it judges that the vehicle is decelerating into the station, and then judges whether the user is arriving at the station or needs to transfer based on the subway line map and the user's needs , and then the terminal provides an arrival or transfer reminder.
  • the method of recording the acceleration direction of the terminal accelerometer sensor to judge whether it is arriving at the station has a lot to do with the attitude of the mobile phone. It is difficult to accurately judge whether the subway is accelerating or decelerating through the value recorded by the terminal accelerometer. Reminder of inaccurate questions.
  • the embodiment of the present application provides an arrival reminder method, device, terminal, storage medium, and program product, which can improve the accuracy of judging the operating status of public transport vehicles, and further improve the accuracy of the terminal's arrival reminder. Described technical scheme is as follows:
  • an embodiment of the present application provides an arrival reminder method, the method is executed by a terminal, and the method includes:
  • the traffic operation information is used to indicate the operation status of the public transport within the target time period;
  • an arrival reminder device comprising:
  • a data acquisition module configured to acquire ambient sound data within a target time period and inertial sensor data within the target time period
  • the first feature extraction module is used to perform feature extraction on the environmental sound data based on the time series of the environmental sound data to obtain global sound features;
  • the second feature extraction module is used to perform feature extraction on the inertial sensor data based on the time series of the inertial sensor data to obtain global inertial sensor features;
  • a feature fusion module configured to fuse the global sound features and the global inertial sensor features based on a self-attention mechanism to obtain fusion features
  • An information acquisition module configured to acquire traffic operation information based on the fusion feature; the traffic operation information is used to indicate the operation status of public transport within the target time period;
  • a reminder module configured to execute arrival reminders based on the traffic operation information.
  • an embodiment of the present application provides a terminal, the terminal includes a processor and a memory; at least one computer instruction is stored in the memory, and the at least one computer instruction is loaded and executed by the processor to implement The arrival reminder method as described above.
  • an embodiment of the present application provides a computer-readable storage medium, wherein at least one computer instruction is stored in the computer-readable storage medium, and the computer instruction is loaded and executed by a processor to implement the above aspects. arrival reminder method.
  • a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
  • the processor of the terminal reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the terminal executes the arrival reminder method provided in various optional implementation manners of the above aspects.
  • time-series-related global feature extraction is performed on the environmental sound data and inertial sensor data respectively, and then based on the global sound features and global inertial sensor features, combined with the relationship between different modes for fusion Feature extraction avoids the poor accuracy caused by external influences when judging the operating status of public transport vehicles only through single-modal features.
  • the accuracy of the station reminder is performed on the environmental sound data and inertial sensor data respectively, and then based on the global sound features and global inertial sensor features, combined with the relationship between different modes for fusion Feature extraction avoids the poor accuracy caused by external influences when judging the operating status of public transport vehicles only through single-modal features. The accuracy of the station reminder.
  • Fig. 1 is a schematic diagram of an application scenario according to an exemplary embodiment
  • Fig. 2 is a flow chart of a method for alerting arrival at a station according to an exemplary embodiment
  • Fig. 3 is a flowchart of a method for alerting arrival at a station according to an exemplary embodiment
  • Fig. 4 is a flow chart of a method for alerting arrival at a station according to another exemplary embodiment
  • Fig. 5 is a kind of Mel-frequency cepstral coefficient extraction flow chart that the embodiment shown in Fig. 4 relates to;
  • Fig. 6 is a classification model architecture diagram related to the embodiment shown in Fig. 4;
  • Fig. 7 is a flow chart of a method for determining arrival at a station according to the embodiment shown in Fig. 4;
  • Fig. 8 is a structural block diagram of an arrival reminding device provided by an exemplary embodiment of the present application.
  • Fig. 9 shows a structural block diagram of a terminal provided by an exemplary embodiment of the present application.
  • the "plurality” mentioned herein means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently.
  • the character “/” generally indicates that the contextual objects are an "or” relationship.
  • Subsequent embodiments of the present application provide an arrival reminder solution, which can remind the station that the public transportation vehicle arrives when the user takes the public transportation vehicle daily.
  • FIG. 1 shows a schematic diagram of application scenarios involved in various embodiments of the present application.
  • a microphone 101 and an inertial sensor 102 are built in the terminal 100 .
  • the terminal 100 may be a smart phone, a tablet computer, an e-book reader, a personal portable computer, and the like.
  • an application program with an arrival reminder function can be installed in the terminal 100, and the application program can combine the data collected by the microphone 101 and the inertial sensor 102 to perform an arrival reminder.
  • the terminal 100 can collect environmental sound data through the microphone 101, and collect inertial sensor data through the inertial sensor 102.
  • the data and the inertial sensor data are combined with the route 140 of the public transport to determine whether to perform an arrival reminder, and when it is determined that the arrival reminder is necessary, an arrival reminder is sent to the user.
  • Fig. 2 shows a flow chart of an arrival reminder method provided by an exemplary embodiment of the present application.
  • the arrival reminding method may be performed by a terminal, and the terminal may be a terminal having a sound collection function and an inertial sensor data collection function, for example, the terminal may be the terminal 100 in the above-mentioned application scenario shown in FIG. 1 .
  • This arrival reminding method comprises the steps:
  • Step 201 acquiring environmental sound data within a target time period and inertial sensor data within a target time period.
  • the terminal collects the environmental sound data within the target time period and the inertial sensor data within the target time period.
  • the terminal can collect environmental sound data and inertial sensor data according to a specified period of time.
  • Each collection period collects environmental sound data and inertial sensor data within a specified period of time.
  • the environmental sound data and inertial sensor data within the above target time period refers to the data collected in one of the collection periods, for example, the environmental sound data and the inertial sensor data collected in the latest collection period.
  • an inertial sensor can also be called an inertial measurement unit (Inertial Measurement Unit, IMU).
  • IMU Inertial Measurement Unit
  • An inertial sensor is a device that measures the three-axis attitude angle (or angular rate) and acceleration of an object.
  • an IMU includes three single-axis accelerometers and three single-axis gyroscopes.
  • the accelerometers are used to detect the acceleration signals of the three-axis independent object in the carrier coordinate system, and the gyroscope is used to detect the carrier relative to the navigation coordinate system.
  • the angular velocity signal of the object can be measured through the IMU to measure the angular velocity and acceleration of the object in three-dimensional space.
  • Step 202 based on the time sequence of the environmental sound data, perform feature extraction on the environmental sound data to obtain global sound features.
  • the terminal when the terminal performs feature extraction on the environmental sound data based on the time series of the environmental sound data in the target time period, global sound features related to the time series can be obtained.
  • the global sound feature Since the global sound feature is extracted based on the time series of the environmental sound data, the global sound feature has a better representation of the environmental sound data.
  • Step 203 based on the time series of the inertial sensor data, perform feature extraction on the inertial sensor data to obtain global inertial sensor features.
  • the terminal when the terminal performs feature extraction on the inertial sensor data based on the time series of the inertial sensor data in the target time period, the global inertial sensor features related to the time series can be obtained.
  • the global inertial sensor feature is obtained through global feature extraction based on the time series of the inertial sensor data, the global inertial sensor feature has better representation for the inertial sensor data.
  • step 204 the global sound feature and the global inertial sensor feature are fused based on the self-attention mechanism to obtain the fused feature.
  • the terminal may perform feature fusion processing on the global features of the two modalities, the global sound feature and the global inertial sensor feature, through a self-attention mechanism, to obtain the features of the fusion of the two modalities.
  • the scheme shown in the embodiment of this application can combine the global sound features and global inertial sensor features.
  • the feature fusion of the relationship can ensure the fusion effect between the features of the two modalities, and then improve the accuracy of subsequent traffic operation information acquisition based on the fusion features.
  • Step 205 acquiring traffic operation information based on the fusion feature; the traffic operation information is used to indicate the operation status of the public transport within the target time period.
  • the aforementioned running state refers to the driving state of the public transport.
  • the aforementioned running states may include a constant speed driving state, a starting acceleration state, a braking deceleration state, a parking state, and the like.
  • the terminal can predict the traffic operation information corresponding to the fusion feature by processing and analyzing the obtained fusion feature, so as to determine the driving status of the public transport within the target time period.
  • Step 206 execute arrival reminder based on traffic operation information.
  • the terminal can predict whether the terminal will arrive at the station based on the operation state indicated by the traffic operation information, and determine whether to issue an arrival reminder to the user based on the result of the arrival prediction.
  • the above-mentioned arrival reminder may be an arrival reminder for a target site (such as a destination site or a transfer site) in the route traveled by the public transport. That is to say, if it is predicted based on the traffic operation information that the public transport vehicle arrives or is about to arrive at the target site, an arrival reminder can be sent to the user.
  • a target site such as a destination site or a transfer site
  • the terminal in order to prevent the time between the time when the terminal sends out the arrival reminder and the time when the public transport closes and drives to the next station is too short, causing the user to miss the time to get off the bus, it can be set to send the arrival message prompts, so that users can prepare for getting off in advance.
  • the way of arrival reminder includes but not limited to: voice reminder, vibration reminder, interface reminder.
  • the station where the terminal is located may be determined in combination with the route map of the vehicle.
  • the terminal loads and stores the route map of the current city's transport vehicles in advance, and the route map includes station information, transfer information, first and last shift time, and a map near the station for each line.
  • the terminal Before the terminal starts to execute the arrival reminder method shown in the embodiment of this application, it can first obtain the user's ride information.
  • the ride information includes the starting station, the target station, the map near the station, and the time of the first and last shift.
  • the station where the terminal is currently located can be determined in combination with the route map of the current public transport.
  • the global sound feature and the global inertial sensor feature are fused based on the self-attention mechanism to obtain the fused feature, including:
  • the spliced global sound features and global inertial sensor features are processed to obtain the respective attention weights of the global sound features and global inertial sensor features;
  • Fusion features are obtained based on the respective attention weights of global sound features and global inertial sensor features.
  • the global sound feature includes global sound sub-features corresponding to at least two time periods within the target time period;
  • the global inertial sensor feature includes global inertial sensor sub-features respectively corresponding to at least two time periods within the target time period ;
  • Splicing global sound features and global inertial sensor features including:
  • the spliced global sound features and global inertial sensor features are processed to obtain the respective attention weights of the global sound features and global inertial sensor features, including:
  • the spliced global sound features and global inertial sensor features are processed to obtain the respective attention weights of the global sound sub-features and the respective attention weights of the global inertial sensor sub-features;
  • fusion features including:
  • the fused features are obtained.
  • obtaining fusion features includes:
  • the weighted summation or weighted average of the global sound sub-features and the global inertial sensor sub-features is performed to obtain fusion features.
  • the environmental sound data includes at least two audio data segments; based on the timing of the environmental sound data, feature extraction is performed on the environmental sound data to obtain global sound features, including:
  • the local sound features of the at least two audio data segments are processed based on a self-attention mechanism to obtain global sound features.
  • the local sound features of the at least two audio data segments are processed based on a self-attention mechanism to obtain global sound features, including:
  • the respective sound local features of the at least two audio data segments are processed based on the self-attention mechanism, and the respective attention weights of the at least two audio data segments are obtained;
  • the local sound features of the at least two audio data segments are weighted to obtain global sound features.
  • the inertial sensor data includes at least two sensor data segments; based on the timing of the inertial sensor data, feature extraction is performed on the inertial sensor data to obtain global inertial sensor features, including:
  • the respective sensor local features of the at least two sensor data segments are processed based on a self-attention mechanism to obtain global inertial sensor features.
  • the respective sensor local features of the at least two sensor data segments are processed based on a self-attention mechanism to obtain global inertial sensor features, including:
  • the respective sensor local features of the at least two sensor data segments are processed based on a self-attention mechanism, and the respective attention weights of the at least two sensor data segments are obtained;
  • weighting is performed on the respective sensor local features of the at least two sensor data segments to obtain global inertial sensor features.
  • the traffic operation information is used to indicate whether the operation state of the public transportation vehicle within the target time period is a stopped state
  • Carry out arrival reminder based on traffic operation information including:
  • an arrival reminder is executed.
  • the arrival reminder is executed, including:
  • the arrival reminder is executed; the designated station is the destination station or transfer station on the target route.
  • the environmental sound data within the target time period and the inertial sensor data within the target time period it also includes:
  • Display the route setting interface obtain the target route according to the start site and destination site set by the user in the route setting interface.
  • the environmental sound data within the target time period and the inertial sensor data within the target time period it also includes:
  • the arrival reminder is executed, including:
  • the environmental sound data and inertial sensor data are collected in real time, and time-series-related global feature extraction is performed on the environmental sound data and inertial sensor data respectively, and then based on the global sound feature and the global inertial sensor feature , combined with the relationship between the features of different modalities for fusion feature extraction, avoiding the situation of poor accuracy caused by external influences when judging the operating status of public transportation vehicles only through single modal features, and improving public transportation tools.
  • the accuracy of the judgment of the operating status of the system thereby improving the accuracy of the arrival reminder.
  • the embodiment of the present application provides a method of arrival reminder, The flow of the arrival reminder method is shown in FIG. 3 .
  • step 301 Before using the arrival reminder function for the first time, the terminal executes step 301 to store the public transportation route map; when the terminal turns on the arrival reminder function, first executes step 302 to determine the bus route; after entering the public transportation, execute step 303, acquire the environmental sound in real time through the microphone, and collect the sensor data through the inertial sensor of the terminal; perform step 304, judge the starting and stopping state of the public transport through the collected environmental sound and sensor data, that is, determine whether the public transport is in a stop Running state or running state, when it is judged that the public transport is in the running state, continue to perform step 303, when it is judged that the public transport is in the out-of-service state, it can be determined that the public transport enters a certain station, step 305, combined with taking Car route and the number of stations that have traveled, judge whether the station is the destination station, if the station entered is the destination station, then execute step 306, send the arrival reminder, if the station is not the destination station, then execute step 307, Judging whether the
  • the sound acquired on the public transport and the inertial sensor data collected by the inertial sensor of the terminal are combined to determine the start and stop status of the public transport, so as to combine the target route to determine that the public transport is at a stop The site where the system is in the state, and then the arrival reminder, because the combination of the characteristics of the sound and the data collected by the inertial sensor is used to judge the start-stop state of the public transport, avoiding the use of sound data alone to judge the start-stop state, Affected by factors such as microphone blockage, errors in judging the start-stop status of public transport vehicles; also avoiding the impact of changes in terminal posture when judging the start-stop status using data collected by inertial sensors alone. Inaccurate judgment of tool start-stop status; therefore, combining these two data features to make them complementary, thereby improving the robustness of the public transportation tool start-stop judgment algorithm.
  • Fig. 4 shows a flow chart of an arrival reminder method provided by an exemplary embodiment of the present application.
  • the arrival reminding method can be performed by a terminal, for example, the terminal can be a terminal having a sound collection function and an inertial sensor data collection function, for example, the terminal can be the terminal 100 in the above application scenario shown in FIG. 1 .
  • This arrival reminding method comprises the steps:
  • Step 401 acquire the target route of the public transport.
  • the target route of the above-mentioned public transportation means can be set by the user, for example, the terminal can display the route setting interface, and obtain the target route according to the starting site and destination site set by the user in the route setting interface .
  • the terminal can display the route setting interface on the application program interface, and generate a route from the starting site to the destination site by receiving the starting site and the destination site set by the user in the route setting interface. target route.
  • the terminal may acquire user location information in real time, and determine the starting site according to the current location information of the user.
  • the terminal may also determine the starting site according to the user's selection operation on the starting site in the route setting interface. Similarly, the terminal may determine the destination site according to the user's selection operation of the destination site in the route setting interface.
  • the terminal may acquire at least one route passing through the starting site and the target site successively based on the pre-stored route map of the public transport, and determine the target route from the at least one route.
  • the terminal may automatically recommend one of the at least one routes, and determine the automatically recommended route as the target route, or, the terminal may also display at least one route on the interface, and determine the target route among them by receiving a user's selection operation.
  • the number of interval stations between the starting station and the target station in at least one route can be obtained, and the route with the least number of interval stations can be determined as the target route, or public transportation can be obtained
  • the route with the shortest estimated travel time is determined as the target route according to the estimated travel time of at least one route from the starting station to the target station.
  • the terminal can confirm that the user has entered or is about to enter the public transportation, and at this time, the arrival reminder function can be enabled.
  • the user can manually input the start site and the destination site, and select an appropriate route as the target route.
  • the terminal may also perform route prediction based on the user's behavior habits to determine the target route.
  • the terminal predicts the route according to the user's historical movement track, and can obtain the target route.
  • the historical movement trajectory may be the movement trajectory of the user within a specified time period collected by the terminal.
  • the terminal matches each movement trajectory in the historical movement trajectory with the complete route map of public transport, and obtains the routes covered by each movement trajectory in the complete route map. If there is a specified Route, if the ratio of the quantity of the specified route in the quantity of all covered routes is greater than the specified threshold, then the specified route is determined as the target route (for example, there are 3 historical moving tracks, which are respectively moving track A, moving track B and trajectory C, the route covered by trajectory A on the complete route map is the route from site a to site b, the route covered by trajectory B on the complete route map is the route from site b to site c, The route covered by moving track C on the complete route map is the route from site a to site b, since both moving track A and moving track C cover the route from site a to site b, and account for all the covered routes ratio is greater than 1/2, then the route from station a to station b is determined as the target route).
  • the target route for example, there are 3 historical moving tracks, which are respectively moving track A, moving track B and trajectory C
  • route prediction based on the user's historical movement trajectory can also be carried out by means of machine learning model prediction.
  • the historical movement trajectory is input into the target route prediction model, and the target route prediction model outputs the target route.
  • the target route prediction model may be a neural network model trained based on historical moving track samples and route labels.
  • the terminal can obtain the user's movement trajectory within a specified time (such as one week or one month) before the current time, perform statistical analysis or machine learning model prediction through the user's movement trajectory, and obtain the information of the public transportation that the user will take. target route.
  • a specified time such as one week or one month
  • Step 402 acquiring environmental sound data within the target time period and inertial sensor data within the target time period.
  • the terminal can collect audio and sensor data in real time according to a certain period, and obtain the environmental sound data and the target time period within the target time period. Inertial sensor data.
  • the terminal may execute a step of acquiring traffic operation information.
  • the terminal may acquire the environmental sound data and inertial sensor data collected within 2s each time as the data collected within the target time period.
  • Step 403 perform audio feature extraction on at least two audio data segments respectively, and obtain Mel frequency cepstral coefficient features of the at least two audio data segments respectively.
  • the environmental sound data within the target time period includes at least two audio data segments
  • the terminal performs audio feature extraction on the at least two audio data segments to obtain the corresponding Mel frequencies of the at least two audio data segments to the spectral coefficient feature.
  • the terminal microphone collects the environmental sound data in real time, the data is not stable as a whole, but its part can be regarded as stable data, so the environmental sound data within the target time period is divided into frames, At least two audio data segments arranged in continuous time sequence within the target time period are obtained.
  • the sampling frequency of the environmental sound data may be 16 kHz, and the sampling frequency of the inertial sensor data may be 200 Hz. Since the sampling frequency of the environmental sound data is much higher than the sampling frequency of the inertial sensor data, the terminal may perform Preliminary feature extraction, so that the features of the environmental sound data within the target time period can be quantitatively matched with the features of the inertial sensor data within the target time period.
  • the terminal may perform preliminary feature extraction on at least two audio data segments to obtain preliminary audio features, and the preliminary audio features include Mel-Frequency Cepstral Coefficients (Mel-Frequency Cepstral Coefficients, MFCC).
  • Mel-Frequency Cepstral Coefficients Mel-Frequency Cepstral Coefficients
  • FIG. 5 is a flowchart of extraction of Mel-frequency cepstrum coefficients involved in the embodiment of the present application. As shown in Figure 5, the extraction of Mel frequency cepstral coefficients may include the following steps:
  • the audio data segment is pre-emphasized through the pre-emphasis module 501.
  • the pre-emphasis module can use a high-pass filter, which only allows signal components higher than a certain frequency to pass through, and suppresses signal components lower than this frequency, thereby removing audio. Unnecessary low-frequency interference such as human conversation, footsteps, and mechanical noise in the data segment flatten the frequency spectrum of the audio data segment.
  • the mathematical expression of the high-pass filter is:
  • a is a correction coefficient, generally ranging from 0.95 to 0.97
  • z is an audio signal of the audio data segment.
  • the noise-removed audio data segment is subjected to frame division processing by the frame division and windowing module 502 to obtain audio data corresponding to different audio frames.
  • the audio data including 512 data points can be divided into one frame, and when the sampling frequency of the audio data is selected as 16kHz, the duration of one frame of audio data is 32ms.
  • this application can slide backward for 16ms after each frame of data is fetched, and then fetch the next frame of data. That is, two adjacent frames of data overlap by 16ms.
  • the audio data after frame processing needs to be subjected to discrete Fourier transform during subsequent feature extraction, and a frame of audio data has no obvious periodicity, that is, the left end of the frame and the right end of the frame are discontinuous. There will be errors, and the more frames are divided, the greater the error.
  • the scheme shown in the embodiment of the present application uses the framing and windowing module 502 Perform frame and window processing.
  • the terminal may perform windowing processing on the audio frame by using a Hamming window.
  • a Hamming window by multiplying each frame of audio data by the Hamming window function, the obtained audio data has obvious periodicity.
  • the functional form of the Hamming window is:
  • n is an integer, and the value of n ranges from 0 to M, and M is the number of points of Fourier transform.
  • this embodiment takes 512 data points as the number of points of Fourier transform.
  • the terminal since it is difficult to obtain the signal characteristics from the transformation of the audio signal in the time domain, it is usually necessary to convert the time domain signal into the energy distribution in the frequency domain for processing, so the terminal first converts the audio frame The data is input into the Fourier transform module 503 for Fourier transform, and then the audio frame data after Fourier transform is input into the energy spectrum calculation module 504 to calculate the energy spectrum of the audio frame data.
  • the energy spectrum calculation module 504 In order to convert its energy spectrum into a mel spectrum that conforms to human hearing, it is necessary to input the energy spectrum into the mel filter processing module 505 for filtering processing.
  • the mathematical expression of the filtering processing is:
  • f is the frequency point after Fourier transform.
  • the terminal After obtaining the mel spectrum of the audio frame, the terminal takes the logarithm of it through the discrete cosine transform (Discrete Cosine Transform, DCT) module 506, and the obtained DCT coefficient is the MFCC feature.
  • DCT discrete Cosine Transform
  • the embodiment of the present application can select 64-dimensional MFCC features.
  • the input window length of audio data can be selected as 200ms, and the time length of one frame signal is 32ms. Between two adjacent frames of data There is an overlap of 16ms between them, so each input window data of 200ms corresponds to a matrix with a characteristic of 12*64.
  • Step 404 performing feature extraction on the respective Mel-frequency cepstral coefficient features of at least two audio data segments to obtain respective sound local features of the at least two audio data segments.
  • At least two audio data segments are respectively subjected to the above-mentioned MFCC feature extraction to obtain respective corresponding MFCC features, and local feature extraction is performed on the respective MFCC features corresponding to at least two audio data segments to obtain at least two The local sound features corresponding to each audio data segment.
  • the terminal may perform local feature extraction on the MFCC features corresponding to each audio data segment through the first convolutional neural network, so as to obtain the sound local features of each audio data segment.
  • the first convolutional neural network is a convolutional neural network (Convolutional Neural Network, CNN) for extracting local features of each audio data segment, and using the convolutional neural network to perform local feature extraction can remove the local features of each audio feature data segment. Redundant feature information in MFCC features.
  • CNN Convolutional Neural Network
  • the environmental sound data collected in the target time period can be divided into 10 audio data segments of 200ms, and the MFCC obtained by extracting each audio data segment The features are respectively extracted through the first convolutional neural network to obtain local features of the sound corresponding to each audio data segment.
  • Step 405 according to the time domain sequence of the at least two audio data segments, process the respective sound local features of the at least two audio data segments based on the self-attention mechanism to obtain global sound features.
  • the terminal performs self-attention processing on the local sound features corresponding to each audio data segment in the order of first to last in the time domain, and obtains a global sound feature corresponding to the environmental sound data in the target time period .
  • the global sound feature is a global feature obtained by fusing the respective sound local features of each audio data segment based on the temporal relationship between each audio data segment.
  • the terminal may process the local sound features of at least two audio data segments based on the self-attention mechanism according to the time domain sequence of the at least two audio data segments, and obtain at least two audio data segments.
  • the terminal may input the local sound features of the at least two audio data segments into the first self-attention network in sequence in the time domain, and the first self-attention network performs time-sequence-related global feature extraction to obtain global sound features.
  • the terminal inputs the local sound features corresponding to the 10 audio data segments output by the first convolutional neural network into the first self In the attention network, the first self-attention network determines the attention weights corresponding to each audio data segment based on the self-attention mechanism, and weights the sound local features of the 10 audio data segments with their corresponding attention weights to obtain Global sound characteristics.
  • weighted summation or weighted splicing may be performed on the respective sound local features of the at least two audio data segments to obtain global sound features.
  • the terminal may also multiply the attention weight corresponding to each audio data segment by the sound local feature corresponding to each audio data segment, and then concatenate them as the global sound feature.
  • the attention weight obtained based on the self-attention mechanism is (0.1, 0.3, 0.6)
  • the global sound feature D is (A*0.2, B*0.6, C* 0.2).
  • Step 406 performing feature extraction on at least two sensor data segments to obtain respective sensor local features of the at least two sensor data segments.
  • the inertial sensor data collected within the target time period includes at least two sensor data segments, and local feature extraction is performed on the at least two sensor data segments respectively, and the corresponding sensor values of at least two sensor data segments can be obtained. local features.
  • the terminal may perform local feature extraction on each sensor data segment through the second convolutional neural network, and acquire sensor local features of each sensor data segment.
  • the second convolutional neural network is a convolutional neural network for extracting local features of each sensor data segment, and using the convolutional neural network to perform local feature extraction can remove redundant feature information in each sensor data segment.
  • the inertial sensor data collected in the target time period can be divided into 10 sensor data segments of 200ms, and each sensor data segment passes through the second
  • the convolutional neural network performs local feature extraction to obtain the sensor local features corresponding to each sensor data segment.
  • Step 407 according to the time domain sequence of the at least two sensor data segments, process the respective sensor local features of the at least two sensor data segments based on the self-attention mechanism to obtain global inertial sensor features.
  • the terminal performs self-attention processing on the local sound features corresponding to each sensor data segment according to the sequence from first to last in the time domain, and obtains a global inertial sensor corresponding to the inertial sensor data within the target time period feature.
  • the global inertial sensor feature is a global feature obtained by fusing the respective sensor local features of each sensor data segment based on the timing relationship between each sensor data segment.
  • the terminal may process the local sensor features of the at least two sensor data segments based on the self-attention mechanism according to the time domain sequence of the at least two sensor data segments, and obtain at least two sensor data segments respective attention weights; then, based on the respective attention weights of the at least two sensor data segments, weighting the respective sensor local features of the at least two sensor data segments to obtain global inertial sensor features.
  • the terminal may input the local sensor features of at least two sensor data segments into the second self-attention network in sequence in the time domain, and the second self-attention network performs time-series-related global feature extraction to obtain global inertial sensor features.
  • the terminal inputs the local features of the inertial sensor corresponding to the 10 sensor data segments output by the second convolutional neural network in sequence in the time domain.
  • the second self-attention network determines the attention weight corresponding to each sensor data segment based on the self-attention mechanism, and weights the local characteristics of the inertial sensor of the 10 sensor data segments with their respective attention weights Processing to get the global inertial sensor features.
  • weighted summation or weighted splicing may be performed on the local inertial sensor features of the at least two sensor data segments to obtain global inertial sensor features.
  • the terminal may also multiply the attention weight corresponding to each inertial sensor data segment by the sensor local feature corresponding to each inertial sensor data segment, and then stitch them together as the global inertial sensor feature.
  • the attention weight obtained based on the self-attention mechanism is (0.1, 0.3, 0.6)
  • the global inertial sensor feature W is (X*0.1, Y*0.3, Z*0.6).
  • Step 408 splicing the global sound features and the global inertial sensor features.
  • the terminal performs feature splicing on the global sound feature and the global inertial sensor feature to obtain the spliced global feature.
  • the global sound features include global sound sub-features corresponding to at least two time periods in the target time period;
  • the global inertial sensor features include global inertia sub-features corresponding to at least two time periods in the target time period Sensor subfeature. Splicing the global sound sub-features and the global inertial sensor sub-features corresponding to at least two time periods within the target time period.
  • the spliced global feature can be (D, W); if D is (A*0.2, B*0.6, C*0.2), then The global sound sub-features can be A*0.2, B*0.6, and C*0.2; if W is (X*0.1, Y*0.3, Z*0.6), the global inertial sensor sub-features can be X*0.1, Y*0.3 and Z*0.6.
  • the number of dimensions of the global sound sub-feature is the same as that of the global inertial sensor sub-feature.
  • the respective dimensions of the global sound sub-features are determined by the output feature dimensions of the first self-attention network, and the respective dimensions of the global inertial sensor sub-features are determined by the output feature dimensions of the second self-attention network.
  • the first convolutional neural network extracts the sound local features of each audio data segment, if the feature dimension of the sound local feature is N, and contains 10 audio data segments in the target time period, then the first volume
  • the product neural network outputs a 10 ⁇ N sound local feature vector composed of 10 local features, and inputs the 10 ⁇ N sound local feature vector into the first self-attention network to extract a 10 ⁇ N global sound feature vector.
  • the second convolutional neural network extracts the sensor local features of each sensor data segment. If the feature dimension of the sensor local feature is N, and there are 10 sensor data segments in the target time period, it can be output by the second convolutional neural network.
  • the 10 ⁇ N sensor local feature vector composed of 10 local features is input into the second self-attention network to extract the 10 ⁇ N global inertial sensor feature vector.
  • a fused feature vector of 20 ⁇ N is finally obtained, where 20 is the time length, and the fused feature vector of 20 ⁇ N indicates 20 N-dimensional vector features.
  • Step 409 Process the spliced global sound features and global inertial sensor features based on the self-attention mechanism, obtain the respective attention weights of the global sound features and the global inertial sensor features, and obtain the respective attention weights based on the global sound features and the global inertial sensor features. Force weights to obtain fusion features.
  • the terminal processes the spliced global features through a self-attention mechanism, and obtains attention weights corresponding to the global sound features and the global inertial sensor features in the global features. Based on the attention weights corresponding to the two modalities, the terminal performs feature fusion on the global features of the two modalities to obtain fused features.
  • the spliced global sound features and global inertial sensor features are processed based on the self-attention mechanism, and the respective attention weights of the global sound sub-features and the respective attention weights of the global inertial sensor sub-features are obtained ; Obtain fusion features based on the respective attention weights of the global sound sub-features and the respective attention weights of the global inertial sensor sub-features.
  • the global sound sub-feature corresponding to each time segment may be a feature obtained after processing the sound local feature corresponding to each audio data segment based on a self-attention mechanism.
  • the global inertial sensor sub-features corresponding to each time period may be the features obtained after processing the sensor local features corresponding to each sensor data segment based on the self-attention mechanism.
  • the spliced global sound features and global inertial sensor features can extract the relationship between different modalities through the third self-attention network, and take into account the influence of timing on the fusion features, and perform global feature extraction to obtain fusion features.
  • the terminal performs weighted summation or weighted average of the global sound feature and the global inertial sensor feature based on the respective attention weights of the global sound feature and the global inertial sensor feature to obtain the fusion feature.
  • the spliced global sound features and global inertial sensor features can extract the relationship between different modalities through the third self-attention network, and perform global feature extraction to obtain fusion features.
  • the spliced global sound features and global inertial sensor features are processed based on the self-attention mechanism, and the respective attention weights of the global sound sub-features and the respective attention weights of the global inertial sensor sub-features are obtained ; Obtain fusion features based on the respective attention weights of the global sound sub-features and the respective attention weights of the global inertial sensor sub-features.
  • weighted summation or weighted averaging is performed on the global sound sub-features and the global inertial sensor sub-features to obtain fusion features.
  • the spliced global feature is (A, B, X, Y)
  • the respective attention weights of the global sound sub-feature and the global inertial sensor sub-feature are 0.2, 0.3, 0.1 and 0.5, multiplied by the above method by their respective attention weights and then summed, the resulting fusion feature is A*0.2+B*0.3+X*0.1+Y*0.5.
  • the terminal may also multiply the global sound features and the global inertial sensor features by their respective attention weights based on the respective attention weights of the global sound features and the global inertial sensor features, and then multiply the global sound features and the global inertial sensor features by their respective attention weights.
  • the two features after the attention weight are concatenated as fusion features.
  • the respective attention weights of the global sound feature and the global inertial sensor feature are 0.2 and 0.8 respectively. If the spliced global feature is (D, W), the fusion feature is (D *0.2, W*0.6).
  • the two features after multiplying the respective attention weights by the above method are performed
  • the fusion features obtained by splicing are (A*0.2*0.2, B*0.6*0.2, C*0.2*0.2, X*0.1*0.8, Y*0.3*0.8, Z*0.6*0.8).
  • the terminal when the terminal performs feature fusion on the global features of the two modalities based on the attention weights corresponding to the two modalities, it can further combine the smoothing parameters of the environmental sound data to perform feature fusion .
  • the terminal can obtain the respective average volume values of the at least two audio data segments; based on the respective volume average values of the at least two audio data segments, obtain the environmental The volume average value of the sound data; then according to the volume average value of the environmental sound data and the respective volume average values of at least two audio data segments, obtain the smoothing parameter of the volume of the environmental sound data, wherein the smoothing parameter is used to indicate the volume of the environmental sound data smoothness.
  • the terminal Before performing feature fusion on the global features of the two modalities, the terminal can obtain the adjustment coefficient of the global sound feature according to the smoothing parameter of the volume of the ambient sound data, and multiply the adjustment coefficient with the global sound feature to obtain the adjusted global sound feature, when performing feature fusion on the global features of the two modalities, the adjusted global sound features and global inertial sensor features can be spliced.
  • the adjustment coefficient is negatively correlated with the smoothing parameter of the volume of the environmental sound data, that is to say, the smoother the volume of the environmental sound data is, the smaller the smoothing parameter is, and the larger the adjustment coefficient is; correspondingly, the smoothing of the volume of the environmental sound data The larger the parameter,
  • the above-mentioned smoothing parameter may be a parameter such as standard deviation or variance, which represents the degree of dispersion of the data set.
  • a small adjustment coefficient (such as 0.9) can be used to adjust the The global sound features are suppressed to reduce the proportion of the global sound features in the fusion features; on the contrary, the smoothing parameters of the environmental sound data are lower, indicating that the less irregular noise contained in the environmental sound data, the less impact on the subsequent arrival detection.
  • the global sound feature can be enhanced by a larger adjustment coefficient (such as 1.1) to increase the proportion of the global sound feature in the fusion feature.
  • the proportion of the global sound feature in the fusion feature can be flexibly adjusted, and the accuracy of subsequent arrival reminder judgments can be further improved.
  • Step 410 acquiring traffic operation information based on the fusion features.
  • the traffic operation information is used to indicate the operation state of the public transport within the target time period.
  • the terminal can classify the fused features through the fully connected network and the classifier, and output the traffic operation information of the public transport, so as to determine the operating status of the public transport within the target time period.
  • FIG. 6 is an architecture diagram of a classification model involved in the embodiment of the present application.
  • the classification model is stored in the terminal for judging the operating status of the public transport vehicle based on the environmental sound data and inertial sensor data collected by the terminal.
  • the classification model includes the first convolutional network layer 61, A second convolutional network layer 62 , a first self-attention network layer 63 , a second self-attention network layer 64 , a third self-attention network layer 65 , a fully connected network layer 66 and a classifier 67 .
  • the first convolutional network layer 61 is used to extract local features of the environmental sound data.
  • the environmental sound data of 2s is divided into 10 audio data segments of 200 ms, and the MFCC feature extraction is calculated by the feature extraction module for the 10 audio data segments of 200 ms. Then input the extracted MFCC features into the first convolutional neural network layer 61 in turn, perform local feature extraction to obtain the corresponding sound local features of each audio data segment, and then input each sound local feature into the first self-attention according to the order of time domain
  • the network layer 63 obtains global sound features through the time-series-related global feature extraction of the first self-attention network layer 63 .
  • the second convolutional network layer 62 is used to extract local features from the inertial sensor data.
  • the 2s inertial sensor data is divided into 10 sensor data segments of 200 ms, and the 10 sensor data segments of 200 ms are sequentially input into the second convolutional neural network.
  • the network layer 62 performs local feature extraction to obtain the local features of the inertial sensors corresponding to each sensor data segment, and then inputs the local features of each inertial sensor into the second self-attention network layer 64 according to the order of time domain, and passes through the second self-attention network layer. 64 timing-related global feature extraction to obtain global inertial sensor features.
  • the global inertial sensor feature and the global sound feature obtained in 2s are input into the third self-attention network layer 65, and the third self-attention network layer 65 is used to extract global features from the self-attention weight distribution of multimodality, by the first
  • the three-self-attention network layer 65 outputs the fusion feature, which is input into the fully connected network 66 and the classifier 67, and the traffic operation information corresponding to the fusion feature is output.
  • the classifier 67 can adopt classifiers of different algorithms, such as SVM (Support Vector Machine, support vector machine), decision tree classification model algorithm, and binary classification model algorithm.
  • SVM Small Vector Machine, support vector machine
  • decision tree classification model algorithm decision tree classification model algorithm
  • binary classification model algorithm binary classification model algorithm
  • the training process of the above classification model can be as follows: the model training device obtains the sample environmental sound data and sample inertial sensor data within the sample time period; input the sample environmental sound data and sample inertial sensor data into the classification model , to obtain the predicted traffic operation information output by the classification model; obtain the loss function value based on the predicted traffic operation information, the sample environmental sound data and the traffic operation information label corresponding to the sample inertial sensor data; update the model parameters of the classification model based on the loss function value .
  • Step 411 execute arrival reminder based on traffic operation information.
  • the terminal determines the operation status of the public transport within the target time period through the acquired traffic operation information within the target time period, and executes the arrival reminder based on the acquired operation status of the public transport.
  • the traffic operation information is used to indicate whether the operation state of the public transportation tool within the target time period is a stopped state.
  • the traffic operation information indicates that the operation state of the public transportation vehicle within the target time period is in a stopped state, an arrival reminder is executed.
  • the terminal when the traffic operation information indicates that the operating state of the public transport within the target time period is in a stopped state, the terminal can obtain the current location of the public transport; When the specified station on the target route matches, the arrival reminder will be executed.
  • the specified station may be a destination station or a transfer station on the target route.
  • the terminal can determine the station corresponding to the current location in combination with the previous stop conditions during the current ride of the public transportation.
  • the terminal determines that the public transport is at the target
  • the station on the line is the i-th station after the starting station; in response to the i-th station or the i+1-th station being the designated station, determine the station type of the designated station, and the station type includes the destination station and the transfer station ; Based on the site type of the specified site, obtain the reminder information corresponding to the site type; based on the reminder information, perform an arrival reminder.
  • the terminal when the terminal detects that the operating state of the public transport is stopped, it can obtain the current location of the public transport, and determine the current corresponding site.
  • the terminal may also acquire the current location of the terminal through a positioning system, or may determine the current location of the terminal through inertial sensor data.
  • the method shown in the embodiment of the present application can also obtain the current location of the terminal through a positioning system (such as obtaining the current location through satellite positioning, cellular network positioning, wireless access point positioning, etc.); There may be locations on the target route that cannot be located by the positioning system (for example, in an underground track with no signal).
  • the terminal can also determine the terminal's position through the inertial sensor data from the terminal position obtained by the positioning system last time.
  • the mobile track combined with the map data built into the terminal and the above-mentioned mobile track, determines the current location of the terminal.
  • an arrival reminder is performed.
  • FIG. 7 is a flow chart of a method for determining arrival at a station according to an embodiment of the present application, as shown in FIG. 7 .
  • First execute step 71 collect the environmental sound data in the target time period, then execute step 72, carry out audio feature preprocessing to the environmental sound data, and extract the MFCC features of each audio data segment corresponding to the environmental sound data, that is, in the above-mentioned embodiment
  • the data corresponding to each window of 200 ms is a 12*64 matrix
  • step 73 is performed, and the extracted MFCC features of each audio data segment are input to the convolutional neural network and the self-attention network with the collected inertial sensor data.
  • the network model structure of multi-modal feature fusion in which the environmental sound data and inertial sensor data are both 2s, and the self-attention network has the characteristics of extracting time series features, so the 2s data is split into ten 200ms data segments, each 200ms is the data of an independent frame.
  • CNN extracts the local features of the data of the independent frame, and then forms 10 local features, which are then input into the self-attention network to extract time-series related features, which can better extract the overall features of the data, thus combining CNN
  • the respective advantages of the self-attention network that is, CNN is better at extracting local features, and the self-attention network is better at extracting global temporal features.
  • step 74 After obtaining the fusion feature, judge whether it is in a stopped state within the target time period, and if it is in a stopped state, then perform step 74 to continuously detect the running state of the public transport vehicle.
  • the data window length of the input model is 2 seconds, which can continuously detect the subway and running status. If the subway is in a stopped state for 5 consecutive detections, it is considered that the subway has arrived at a station.
  • the target time period in the above-mentioned embodiment is 2s, and the traffic operation information indicating that the operation state of the public transport vehicle is in the stopped state is obtained for 5 consecutive times , to determine whether the current site is a specified site, and if it is determined that the current site is a specified site, then execute an arrival reminder corresponding to the specified site.
  • the terminal reminds the user to get off at the station, and can push the map information near the designated station; if the terminal judges that the user has arrived at the transfer station, the terminal reminds the user to transfer, and It can remind the time information of the first and last shift of the transfer vehicle.
  • the environmental sound data and inertial sensor data are collected in real time, and time-series-related global feature extraction is performed on the environmental sound data and inertial sensor data respectively, and then based on the global sound feature and the global inertial sensor feature , combined with the relationship between different modalities for fusion feature extraction, avoiding the situation of poor accuracy caused by external influences when judging the operating status of public transportation vehicles only through single modal features, due to the improvement of public transportation tools
  • the accuracy of running status judgment improves the accuracy of arrival reminder.
  • Fig. 8 shows a structural block diagram of an arrival reminding device provided by an exemplary embodiment of the present application.
  • the arrival reminder device is used to execute all or part of the steps performed by the terminal in the scheme shown in Figure 2 or Figure 4 above, and the arrival reminder device includes:
  • a data acquisition module 810 configured to acquire environmental sound data within a target time period and inertial sensor data within the target time period
  • the first feature extraction module 820 is configured to perform feature extraction on the environmental sound data based on the time sequence of the environmental sound data to obtain global sound features;
  • the second feature extraction module 830 is configured to perform feature extraction on the inertial sensor data based on the time series of the inertial sensor data, to obtain global inertial sensor features;
  • a feature fusion module 840 configured to fuse the global sound features and the global inertial sensor features based on a self-attention mechanism to obtain fusion features
  • An information acquisition module 850 configured to acquire traffic operation information based on the fusion feature; the traffic operation information is used to indicate the operation status of public transport within the target time period;
  • a reminder module 860 configured to execute arrival reminders based on the traffic operation information.
  • the feature fusion module 840 includes:
  • a feature splicing submodule configured to splice the global sound features and the global inertial sensor features
  • the weight acquisition submodule is used to process the spliced global sound features and the global inertial sensor features based on a self-attention mechanism, and obtain respective attention weights of the global sound features and the global inertial sensor features;
  • the feature fusion submodule is configured to acquire the fusion feature based on the respective attention weights of the global sound feature and the global inertial sensor feature.
  • the global sound feature includes global sound sub-features corresponding to at least two time periods in the target time period;
  • the global inertial sensor feature includes at least two sub-features in the target time period.
  • the feature splicing submodule includes:
  • a splicing unit configured to splice the global sound sub-features and the global inertial sensor sub-features respectively corresponding to at least two time periods in the target time period; the number of dimensions of the global sound sub-features is the same as the The number of dimensions of the global inertial sensor sub-features is the same;
  • the weight acquisition submodule includes:
  • a weight unit configured to process the spliced global sound features and the global inertial sensor features based on a self-attention mechanism, and obtain the respective attention weights of the global sound sub-features and the respective global inertial sensor sub-features attention weight;
  • the feature fusion submodule includes:
  • a fused feature acquiring unit configured to acquire the fused feature based on respective attention weights of the global sound sub-features and respective attention weights of the global inertial sensor sub-features.
  • the fusion feature acquisition unit is configured to:
  • weighted summation or weighted average is performed on the global sound sub-features and the global inertial sensor sub-features to obtain The fusion feature.
  • the environmental sound data includes at least two audio data segments
  • the first feature extraction module 820 includes:
  • the first extraction submodule is used to perform audio feature extraction on at least two of the audio data segments, respectively, to obtain the respective Mel frequency cepstral coefficient features of at least two of the audio data segments;
  • the first local acquisition sub-module is used to perform feature extraction on the respective Mel frequency cepstral coefficient features of at least two of the audio data segments, and obtain the respective sound local features of at least two of the audio data segments;
  • the first global acquisition submodule is configured to process the local sound features of at least two audio data segments based on a self-attention mechanism according to the time domain sequence of at least two audio data segments, to obtain the global sound feature.
  • the first global acquisition submodule includes:
  • the first weight acquisition unit is configured to process the respective sound local features of at least two audio data segments based on a self-attention mechanism according to the time domain sequence of the at least two audio data segments, and obtain at least two of the audio data segments.
  • the first global acquisition unit is configured to perform weighting processing on the sound local features of the at least two audio data segments based on the respective attention weights of the at least two audio data segments to obtain the global sound features.
  • the inertial sensor data includes at least two sensor data segments
  • the second feature extraction module 830 includes:
  • the second local acquisition sub-module is used to perform feature extraction on at least two of the sensor data segments, and obtain respective sensor local features of at least two of the sensor data segments;
  • the second global acquisition submodule is configured to process the local sensor features of at least two sensor data segments based on a self-attention mechanism according to the time domain sequence of at least two sensor data segments, and obtain the global inertia Sensor characteristics.
  • the second global acquisition submodule includes:
  • the second weight acquisition unit is configured to process the respective sensor local features of at least two sensor data segments based on a self-attention mechanism according to the time domain sequence of the at least two sensor data segments, and obtain at least two of the sensor data segments.
  • the second global acquisition unit is configured to perform weighting processing on the sensor local features of the at least two sensor data segments based on the respective attention weights of the at least two sensor data segments to obtain the global inertial sensor features.
  • the traffic operation information is used to indicate whether the operation state of the public transport within the target time period is a stop state
  • the reminder module 860 includes:
  • the reminder sub-module is used to execute an arrival reminder when the traffic operation information indicates that the operation state of the public transportation vehicle within the target time period is in a stopped state.
  • the reminder submodule includes:
  • a position acquiring unit configured to acquire the current position of the public transport when the traffic operation information indicates that the operating state of the public transport within the target time period is a stopped state
  • the reminding unit is configured to execute an arrival reminder when the current position of the public transport vehicle matches a designated station on the target route; the designated station is a destination station or a transfer station on the target route.
  • the device further includes:
  • the interface display module is used to display the route setting interface before acquiring the environmental sound data in the target time period and the inertial sensor data in the target time period;
  • a target route acquiring module configured to acquire the target route according to the starting site and destination site set by the user in the route setting interface.
  • the terminal further includes:
  • a target route acquiring module configured to perform route prediction according to the user's historical movement trajectory, and acquire the target route.
  • the reminder submodule includes:
  • the arrival reminder unit is configured to execute arrival reminders when the traffic operation information acquired N times in a row indicates that the operation status of the public transport is stopped.
  • the environmental sound data and inertial sensor data are collected in real time, and time-series-related global feature extraction is performed on the environmental sound data and inertial sensor data respectively, and then based on the global sound feature and the global inertial sensor feature , combined with the relationship between different modalities for fusion feature extraction, avoiding the situation of poor accuracy caused by external influences when judging the operating status of public transportation vehicles only through single modal features, due to the improvement of public transportation tools
  • the accuracy of running status judgment improves the accuracy of arrival reminder.
  • Fig. 9 shows a structural block diagram of a terminal provided by an exemplary embodiment of the present application.
  • the terminal may be a smart phone, a tablet computer, an e-book, a portable personal computer, and other electronic devices installed and running application programs.
  • a terminal in this application may include one or more of the following components: a processor 910 , a memory 920 and a screen 930 .
  • Processor 910 may include one or more processing cores.
  • the processor 910 uses various interfaces and lines to connect various parts of the entire terminal, by running or executing instructions, programs, code sets or instruction sets stored in the memory 920, and calling data stored in the memory 920, to execute terminal functions.
  • Various functions and processing data may adopt at least one of Digital Signal Processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 910 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, and the like.
  • the CPU mainly processes the operating system, the user interface and application programs, etc.; the GPU is used for rendering and drawing the content that needs to be displayed on the screen 930; the modem is used for processing wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 910, but may be realized by a communication chip alone.
  • the memory 920 may include random access memory (Random Access Memory, RAM), and may also include read-only memory (Read-Only Memory, ROM).
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • the memory 920 includes a non-transitory computer-readable storage medium.
  • the memory 920 may be used to store instructions, programs, codes, sets of codes, or sets of instructions.
  • the memory 920 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system and instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for realizing the above-mentioned various method embodiments, etc.
  • the operating system may be an Android system (including a system deeply developed based on the Android system), an IOS system (including a system deeply developed based on the IOS system) or other systems.
  • the storage data area can also store data (such as phone book, audio and video data, and chat record data) created by the terminal during use.
  • the screen 930 may be a capacitive touch display screen, which is used for receiving user's touch operation on or near it with any suitable object such as finger or stylus, and displaying user interfaces of various application programs.
  • any suitable object such as finger or stylus
  • the structure of the terminal shown in the above drawings does not constitute a limitation on the terminal, and the terminal may include more or less components than those shown in the figure, or combine certain components, or different component arrangements.
  • the embodiment of the present application also provides a computer-readable storage medium, at least one computer instruction is stored in the computer-readable storage medium, and the at least one computer instruction is loaded and executed by a processor to realize the above-mentioned various embodiments. Station reminder method.
  • a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
  • the processor of the terminal reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the terminal executes the arrival reminder method provided in various optional implementation manners of the above aspect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Navigation (AREA)
  • Traffic Control Systems (AREA)

Abstract

一种到站提醒方法、装置、终端、存储介质及程序产品,属于终端技术领域。该方法包括:获取目标时间段内的环境音数据和目标时间段内的惯性传感器数据(201);基于环境音数据的时序,对环境音数据进行特征提取获得全局声音特征(202);基于惯性传感器数据的时序,对惯性传感器数据进行特征提取获得全局惯性传感器特征(203);基于自注意力机制对全局声音特征和全局惯性传感器特征进行融合处理,获得融合特征(204);基于融合特征获取交通运行信息(205);基于交通运行信息执行到站提醒(206)。上述方案提高了到站提醒的准确性。

Description

到站提醒方法、装置、终端、存储介质及程序产品
本申请要求于2021年10月26日提交的申请号为202111249921.8、发明名称为“到站提醒方法、装置、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及终端技术领域,特别涉及一种到站提醒方法、装置、终端、存储介质及程序产品。
背景技术
目前,人们乘坐地铁等公共交通工具时,需要时刻注意当前停靠站点是否为自己需要下车的目标站点,随着终端技术的发展,终端可以具有到站提醒功能,用于提醒乘客在到达目标站点时及时下车。
在相关技术中,终端通常利用内嵌的加速度计进行加速度采集,根据加速度计实时记录的加速度值实时确定当前乘坐的交通工具的加速度情况。比如,若终端检测到加速度大于零则判断交通工具处于启动阶段,若检测到加速度小于零则判断交通工具正在减速进站,然后结合地铁线路图以及用户的需求判断用户是否到站或者需要换乘,进而终端进行到站或者换乘提醒。
然而,目前通过终端加速度计传感器记录加速度方向判断是否到站的方式,与手机的姿态有较大关系,通过终端的加速度计记录的数值难以准确判断地铁是处于加速还是减速状态,因此存在到站提醒不准确的问题。
发明内容
本申请实施例提供了一种到站提醒方法、装置、终端、存储介质及程序产品,可以提高公共交通工具的运行状态判断的准确性,进而提高了终端进行到站提醒的准确性。所述技术方案如下:
一方面,本申请实施例提供了一种到站提醒方法,所述方法由终端执行,所述方法包括:
获取目标时间段内的环境音数据和所述目标时间段内的惯性传感器数据;
基于所述环境音数据的时序对所述环境音数据进行特征提取,获得全局声音特征;
基于所述惯性传感器数据的时序对所述惯性传感器数据进行特征提取,获得全局惯性传感器特征;
基于自注意力机制对所述全局声音特征和所述全局惯性传感器特征进行融合处理,获得融合特征;
基于所述融合特征获取交通运行信息;所述交通运行信息用于指示公共交通工具在所述目标时间段内的运行状态;
基于所述交通运行信息执行到站提醒。
另一方面,本申请实施例提供了一种到站提醒装置,所述装置包括:
数据获取模块,用于获取目标时间段内的环境音数据和所述目标时间段内的惯性传感器数据;
第一特征提取模块,用于基于所述环境音数据的时序对所述环境音数据进行特征提取,获得全局声音特征;
第二特征提取模块,用于基于所述惯性传感器数据的时序对所述惯性传感器数据进行特征提取,获得全局惯性传感器特征;
特征融合模块,用于基于自注意力机制对所述全局声音特征和所述全局惯性传感器特征进行融合处理,获得融合特征;
信息获取模块,用于基于所述融合特征获取交通运行信息;所述交通运行信息用于指示公共交通工具在所述目标时间段内的运行状态;
提醒模块,用于基于所述交通运行信息执行到站提醒。
另一方面,本申请实施例提供了一种终端,所述终端包括处理器和存储器;所述存储器中存储有至少一条计算机指令,所述至少一条计算机指令由所述处理器加载并执行以实现如上述方面所述的到站提醒方法。
另一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条计算机指令,所述计算机指令由处理器加载并执行以实现如上述方面所述的到站提醒方法。
根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。终端的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该终端执行上述方面的各种可选实现方式中提供的到站提醒方法。
本申请实施例提供的技术方案的有益效果至少包括:
通过实时采集环境音数据以及惯性传感器数据,并分别对环境音数据以及惯性传感器数据进行时序相关的全局特征提取,然后基于全局声音特征以及全局惯性传感器特征,结合不同模态之间的关系进行融合特征提取,避免了仅通过单一模态特征进行公共交通工具的运行状态判断时,受到外界影响导致准确性较差的情况,由于提高了公共交通工具的运行状态判断的准确性,进而提高了到站提醒的准确性。
附图说明
图1是根据一示例性实施例示出的一种应用场景示意图;
图2是根据一示例性实施例示出的一种到站提醒方法的流程图;
图3是根据一示例性实施例示出的一种到站提醒方法的流程图;
图4是根据另一示例性实施例示出的一种到站提醒方法的流程图;
图5是图4所示实施例涉及的一种梅尔频率倒谱系数提取流程图;
图6是图4所示实施例涉及的一种分类模型架构图;
图7是图4所示实施例涉及的一种到站判断方法的流程图;
图8是本申请一个示例性实施例提供的到站提醒装置的结构框图;
图9示出了本申请一个示例性实施例提供的终端的结构方框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本申请后续实施例提供了一种到站提醒的方案,可以在用户日常乘坐公共交通工具时,对公共交通工具所到达的站点进行提醒。
请参考图1,其示出了本申请各个实施例涉及的应用场景示意图。如图1所示,终端100中内置有麦克风101和惯性传感器102。比如,该终端100可以是智能手机、平板电脑、电子书阅读器、个人便携式计算机等。
可选的,终端100中可以安装具有到站提醒功能的应用程序,该应用程序可以结合麦克 风101和惯性传感器102采集的数据进行到站提醒。
比如,用户携带终端100乘坐公共交通工具120时,若应用程序启动了到站提醒功能,则终端100可以通过麦克风101采集环境音数据,并通过惯性传感器102采集惯性传感器数据,应用程序基于环境音数据和惯性传感器数据,结合公共交通工具的路线140,确定是否进行到站提醒,并在确定需要进行到站提醒时,向用户发出到站提醒。
图2示出了本申请一个示例性实施例提供的到站提醒方法的流程图。其中,该到站提醒方法可以由终端执行,该终端可以是具有声音采集功能以及惯性传感器数据采集功能的终端,例如,该终端可以是上述图1所示应用场景中的终端100。该到站提醒方法包括如下步骤:
步骤201,获取目标时间段内的环境音数据和目标时间段内的惯性传感器数据。
在本申请实施例中,终端采集目标时间段内的环境音数据以及目标时间段内的惯性传感器数据。
比如,终端可以按照指定的时长周期进行环境音数据和惯性传感器数据的采集,每个采集周期采集一段指定时长内的环境音数据和惯性传感器数据,上述目标时间段内的环境音数据和惯性传感器数据,即为其中一个采集周期内采集到的数据,例如可以是最近一个采集周期内采集到的环境音数据和惯性传感器数据。
其中,环境音数据可以通过终端的麦克风组件进行采集。惯性传感器也可以称为惯性测量单元(Inertial Measurement Unit,IMU),惯性传感器是测量物体三轴姿态角(或角速率)以及加速度的装置。一般的,一个IMU包含了三个单轴的加速度计和三个单轴的陀螺,加速度计用于检测物体在载体坐标系统独立三轴的加速度信号,而陀螺用于检测载体相对于导航坐标系的角速度信号,通过IMU可以测量物体在三维空间中的角速度和加速度。
步骤202,基于环境音数据的时序,对环境音数据进行特征提取获得全局声音特征。
在本申请实施例中,终端基于环境音数据在目标时间段中的时序,对环境音数据进行特征提取时,可以获得与时序相关的全局声音特征。
由于全局声音特征是基于环境音数据的时序进行全局特征提取得到的,因此,该全局声音特征对环境音数据具有更好的表征性。
步骤203,基于惯性传感器数据的时序,对惯性传感器数据进行特征提取获得全局惯性传感器特征。
在本申请实施例中,终端基于惯性传感器数据在目标时间段中的时序,对惯性传感器数据进行特征提取时,可以获得与时序相关的全局惯性传感器特征。
与全局声音特征类似的,由于全局惯性传感器特征是基于惯性传感器数据的时序进行全局特征提取得到的,因此,该全局惯性传感器特征对惯性传感器数据具有更好的表征性。
步骤204,基于自注意力机制对全局声音特征和全局惯性传感器特征进行融合处理,获得融合特征。
在本申请实施例中,终端可以通过自注意力机制对全局声音特征以及全局惯性传感器特征这两种模态的全局特征进行特征融合处理,得到两种模态融合的特征。
由于自注意力机制可以更好的提取不同模态的数据之间的关系,因此,本申请实施例所示的方案可以结合全局声音特征以及全局惯性传感器特征这两种模态的数据之间的关系进行特征融合,能够保证两种模态的特征之间的融合效果,继而提升后续基于融合特征进行交通运行信息获取的准确度。
步骤205,基于融合特征获取交通运行信息;交通运行信息用于指示公共交通工具在目标时间段内的运行状态。
在本申请实施例中,上述运行状态是指公共交通工具的行驶状态。比如,上述运行状态可以包括匀速行驶状态、启动加速状态、刹车减速状态以及停车状态等等。
其中,终端通过对获取到的融合特征进行处理分析,可以预测该融合特征对应的交通运 行信息,从而确定公共交通工具在目标时间段内的行驶状态。
步骤206,基于交通运行信息执行到站提醒。
在本申请实施例中,终端基于交通运行信息所指示的运行状态,可以预测终端是否到站,并基于到站预测的结果,确定是否向用户发出到站提醒。
可选的,上述到站提醒可以是对公共交通工具行驶的路线中的目标站点(例如目的站点或者换乘站点)进行到站提醒。也就是说,如果基于交通运行信息预测公共交通工具到达或者将要到达目标站点时,可以向用户发出到站提醒。
比如,为了防止终端发出到站提醒的时间与公共交通工具关门驶往下一站之间的时间过短,导致用户错过下车时间,可以设置当到达目标站点的前一站时发送即将到站的消息提示,以便用户提前做好下车准备。
可选的,到站提醒的方式包括但不限定于:语音提醒、震动提醒、界面提醒。
其中,终端所处的站点可以结合交通工具的线路图来确定。比如,终端中事先加载并存储当前所在城市的交通工具的线路图,线路图中包含每条线路的站点信息、换乘信息、首末班时间及站点附近地图等。终端开始执行本申请实施例所示的到站提醒方法之前,可以先获取用户的乘车信息,乘车信息包括起始站点、目标站点、站点附近地图以及首末班时间等,在执行本申请实施例所示的到站提醒方法的过程中,可以结合当前乘坐的公共交通工具的线路图确定终端当前所处的站点。
在一些实施例中,基于自注意力机制对全局声音特征和全局惯性传感器特征进行融合处理,获得融合特征,包括:
将全局声音特征和全局惯性传感器特征进行拼接;
基于自注意力机制对拼接后的全局声音特征和全局惯性传感器特征进行处理,获得全局声音特征和全局惯性传感器特征各自的注意力权重;
基于全局声音特征和全局惯性传感器特征各自的注意力权重,获取融合特征。
在一些实施例中,全局声音特征包括目标时间段内的至少两个时间段各自对应的全局声音子特征;全局惯性传感器特征包括目标时间段内至少两个时间段各自对应的全局惯性传感器子特征;
将全局声音特征和全局惯性传感器特征进行拼接,包括:
将目标时间段内的至少两个时间段各自对应的全局声音子特征以及全局惯性传感器子特征进行拼接;全局声音子特征的维度数量,与全局惯性传感器子特征的维度数量相同;
基于自注意力机制对拼接后的全局声音特征和全局惯性传感器特征进行处理,获得全局声音特征和全局惯性传感器特征各自的注意力权重,包括:
基于自注意力机制对拼接后的全局声音特征和全局惯性传感器特征进行处理,获得全局声音子特征各自的注意力权重和全局惯性传感器子特征各自的注意力权重;
基于全局声音特征和全局惯性传感器特征各自的注意力权重,获取融合特征,包括:
基于全局声音子特征各自的注意力权重和全局惯性传感器子特征各自的注意力权重,获取融合特征。
在一些实施例中,基于全局声音子特征各自的注意力权重和全局惯性传感器子特征各自的注意力权重,获取融合特征,包括:
基于全局声音子特征各自的注意力权重和全局惯性传感器子特征各自的注意力权重,对全局声音子特征和全局惯性传感器子特征进行加权求和或者加权平均,获得融合特征。
在一些实施例中,环境音数据包含至少两个音频数据段;基于环境音数据的时序,对环境音数据进行特征提取获得全局声音特征,包括:
对至少两个音频数据段分别进行音频特征提取,获得至少两个音频数据段各自的梅尔频率倒谱系数特征;
对至少两个音频数据段各自的梅尔频率倒谱系数特征进行特征提取,获得至少两个音频 数据段各自的声音局部特征;
按照至少两个音频数据段的时域顺序,基于自注意力机制对至少两个音频数据段各自的声音局部特征进行处理,获得全局声音特征。
在一些实施例中,按照至少两个音频数据段的时域顺序,基于自注意力机制对至少两个音频数据段各自的声音局部特征进行处理,获得全局声音特征,包括:
按照至少两个音频数据段的时域顺序,基于自注意力机制对至少两个音频数据段各自的声音局部特征进行处理,获得至少两个音频数据段各自的注意力权重;
基于至少两个音频数据段各自的注意力权重,对至少两个音频数据段各自的声音局部特征进行加权处理,获得全局声音特征。
在一些实施例中,惯性传感器数据包含至少两个传感器数据段;基于惯性传感器数据的时序,对惯性传感器数据进行特征提取获得全局惯性传感器特征,包括:
对至少两个传感器数据段进行特征提取,获得至少两个传感器数据段各自的传感器局部特征;
按照至少两个传感器数据段的时域顺序,基于自注意力机制对至少两个传感器数据段各自的传感器局部特征进行处理,获得全局惯性传感器特征。
在一些实施例中,按照至少两个传感器数据段的时域顺序,基于自注意力机制对至少两个传感器数据段各自的传感器局部特征进行处理,获得全局惯性传感器特征,包括:
按照至少两个传感器数据段的时域顺序,基于自注意力机制对至少两个传感器数据段各自的传感器局部特征进行处理,获得至少两个传感器数据段各自的注意力权重;
基于至少两个传感器数据段各自的注意力权重,对至少两个传感器数据段各自的传感器局部特征进行加权处理,获得全局惯性传感器特征。
在一些实施例中,交通运行信息用于指示公共交通工具在目标时间段内的运行状态是否为停止状态;
基于交通运行信息执行到站提醒,包括:
在交通运行信息指示公共交通工具在目标时间段内的运行状态为停止状态的情况下,执行到站提醒。
在一些实施例中,在交通运行信息指示公共交通工具在目标时间段内的运行状态为停止状态的情况下,执行到站提醒,包括:
在交通运行信息指示公共交通工具在目标时间段内的运行状态为停止状态的情况下,获取公共交通工具的当前位置;
在公共交通工具的当前位置与目标路线上的指定站点相匹配的情况下,执行到站提醒;指定站点是目标路线上的目的站点或者换乘站点。
在一些实施例中,获取目标时间段内的环境音数据和目标时间段内的惯性传感器数据之前,还包括:
展示路线设置界面;根据用户在路线设置界面中设置的起始站点和目的站点,获取目标路线。
在一些实施例中,获取目标时间段内的环境音数据和目标时间段内的惯性传感器数据之前,还包括:
根据用户的历史移动轨迹进行路线预测,获取目标路线。
在一些实施例中,在交通运行信息用于指示公共交通工具在目标时间段内的运行状态为停止状态的情况下,执行到站提醒,包括:
在连续N次获取到的交通运行信息指示公共交通工具的运行状态为停止状态的情况下,执行到站提醒。
综上所述,本申请实施例中,通过实时采集环境音数据以及惯性传感器数据,并分别对环境音数据以及惯性传感器数据进行时序相关的全局特征提取,然后基于全局声音特征以及 全局惯性传感器特征,结合不同模态的特征之间的关系进行融合特征提取,避免了仅通过单一模态特征进行公共交通工具的运行状态判断时,受到外界影响导致准确性较差的情况,提高了公共交通工具的运行状态判断的准确性,进而提高了到站提醒的准确性。
示例性的,以交通运行信息是公共交通工具的启停状态,并且公共交通工具处于目的地站点以及中转站点时执行到站提醒为例,本申请实施例提供了一种到站提醒的方法,该到站提醒方法的流程如图3所示。终端在第一次使用到站提醒功能前,执行步骤301,存储公共交通工具线路图;当终端开启到站提醒功能时,首先执行步骤302,确定乘车路线;进入公共交通工具后,执行步骤303,通过麦克风实时获取环境音,并且通过终端的惯性传感器采集传感器数据;执行步骤304,通过采集到的环境音以及传感器数据,判断公共交通工具的启停状态,即判断公共交通工具是处于停止运行状态还是正在运行状态,当判断公共交通工具处于正在运行的状态时,继续执行步骤303,当判断公共交通工具处于停止运行状态时,可以确定公共交通工具进入某一站点,步骤305,结合乘车路线以及已行驶的站数,判断所在站点是否为目的地站点,若进入的站点为目的地站点,则执行步骤306,发送到站提醒,若所在站点不是目的地站点,则执行步骤307,判断所在站点是否为中转站点,若确定所在站点是中转站点,则执行步骤308,发送换乘提醒,若确定所在站点不是中转站点,则继续执行步骤303。
本申请实施例通过将在公共交通工具上获取到的声音以及在终端的惯性传感器采集到的惯性传感器数据进行结合,判断公共交通工具的启停状态,从而结合目标路线,确定公共交通工具处于停止状态时所处的站点,进而进行到站提醒,由于结合了声音以及惯性传感器采集数据两方面的特征对公共交通工具进行启停状态的判断,避免了单独利用声音数据进行启停状态判断时,受到麦克风堵塞等因素的影响,对公共交通工具启停状态判断出现错误的情况;也避免了单独利用惯性传感器采集的数据进行启停状态判断时,受到终端姿态的变化影响,导致的对公共交通工具启停状态判断不准确的情况;因此,结合这两种数据特征,使其进行互补,从而提高了公共交通工具启停判断算法的鲁棒性。
图4示出了本申请一个示例性实施例提供的到站提醒方法的流程图。其中,该到站提醒方法可以由终端执行,例如,该终端可以是具有声音采集功能以及惯性传感器数据采集功能的终端,例如,该终端可以是上述图1所示应用场景中的终端100。该到站提醒方法包括如下步骤:
步骤401,获取公共交通工具的目标路线。
在一种可能的实现方式中,上述公共交通工具的目标路线可以由用户进行设置,比如,终端可以展示路线设置界面,根据用户在路线设置界面中设置的起始站点和目的站点,获取目标路线。
也就是说,在本申请实施例中,终端可以在应用程序界面上展示路线设置界面,并通过接收用户在路线设置界面中设置的起始站点以及目的站点,生成从起始站点到目的站点的目标路线。
在一种可能的实现方式中,终端在展示上述路线设置界面时,可以实时获取用户位置信息,根据用户当前所处的位置信息确定起始站点。
或者,终端也可以根据用户在路线设置界面中对起始站点的选择操作,确定起始站点。类似的,终端可以根据用户在路线设置界面中对目的站点的选择操作,确定目的站点。
在获取到起始站点和目标站点之后,终端可以基于预先存储的公共交通工具的线路图,获取先后经过起始站点和目标站点的至少一条路线,从至少一条路线中确定目标路线。其中,终端可以对至少一条路线中的一条路线进行自动推荐,确定自动推荐的路线为目标路线,或者,终端也可以在界面上显示至少一条路线,通过接收用户的选择操作确定其中的目标路线。
示例性的,基于自动推荐确定目标路线时,可以获取至少一条路线中的起始站点到目标站点之间的间隔站数,将间隔站数最少的路线确定为目标路线,也可以获取公共交通工具通过至少一条路线从起始站点到目标站点所用的预计行驶时间,将预计行驶时间最短的路线确定为目标路线。
可选的,当用户使用支付类应用程序进行刷卡乘坐交通工具时,终端可以确认用户已经进入或者将要进入公共交通工具,此时可以开启到站提醒功能。
也就是说,在用户开启用于实现到站提醒的应用程序后,可以通过用户手动输入的方式,输入起始站点以及目的站点,并且选择合适的路线作为目标路线。
在另一种可能的实现方式中,终端也可以基于用户的行为习惯进行线路预测,确定目标路线。
其中,终端根据用户的历史移动轨迹进行路线预测,可以获取目标路线。历史移动轨迹可以是终端统计到的指定时间内用户的移动轨迹。
也就是说,终端将历史移动轨迹中的各条移动轨迹与公共交通工具的完整路线图进行匹配,获取各条移动轨迹在完整路线图中所覆盖的路线,若各个所覆盖的路线中存在指定路线,该指定路线的数量在全部所覆盖路线的数量中所占的比例大于指定阈值,则将该指定路线确定为目标路线(比如,存在3条历史移动轨迹,分别为移动轨迹A、移动轨迹B以及移动轨迹C,移动轨迹A在完整路线图上所覆盖的路线是从站点a到站点b的路线,移动轨迹B在完整路线图上所覆盖的路线是从站点b到站点c的路线,移动轨迹C在完整路线图上所覆盖的路线是从站点a到站点b的路线,由于移动轨迹A以及移动轨迹C均覆盖了站点a到站点b的路线,且在全部所覆盖路线中所占的比例大于1/2,则将站点a到站点b的路线确定为目标路线)。
另外,根据用户的历史移动轨迹进行路线预测,还可以通过机器学习模型预测的方式进行。
比如,将历史移动轨迹输入目标路线预测模型中,由目标路线预测模型输出目标路线。该目标路线预测模型可以是基于历史移动轨迹样本以及路线标签训练得到的神经网络模型。
示例性的,终端可以获取用户在当前时间之前的指定时间(比如一周或一个月)内的移动轨迹,通过用户的移动轨迹进行统计分析或者机器学习模型预测,获得用户将要乘坐的公共交通工具的目标路线。
步骤402,获取目标时间段内的环境音数据和目标时间段内的惯性传感器数据。
在本申请实施例中,在应用程序启动了到站提醒功能后,终端可以按照一定的周期,实时的进行音频和传感器数据的采集,获得目标时间段内的环境音数据和目标时间段内的惯性传感器数据。
在一种可能的实现方式中,终端每采集到一个目标时间段内的环境音数据以及惯性传感器数据之后,即可以执行一次交通运行信息的获取步骤。
示例性的,终端每次可以获取2s内采集到的环境音数据以及惯性传感器数据,作为目标时间段内采集到的数据。
步骤403,对至少两个音频数据段分别进行音频特征提取,获得至少两个音频数据段各自的梅尔频率倒谱系数特征。
在本申请实施例中,目标时间段内的环境音数据包含至少两个音频数据段,终端对至少两个音频数据段分别进行音频特征提取,得到至少两个音频数据段各自对应的梅尔频率到谱系数特征。
在一种可能的实现方式中,由于终端麦克风实时采集环境音数据,数据整体上并不是平稳的,但其局部可以看作平稳数据,所以对目标时间段内的环境音数据进行分帧处理,得到目标时间段内连续时序排列的至少两个音频数据段。
其中,环境音数据的采样频率可以是16kHz,而惯性传感器数据的采样频率可以是200Hz, 由于环境音数据的采样频率远远高于惯性传感器数据的采样频率,因此,终端可以对环境音数据进行初步的特征提取,使得目标时间段内的环境音数据的特征可以与目标时间段内的惯性传感器数据的特征在数量上相匹配。
在一种可能的实现方式中,终端可以对至少两个音频数据段进行初步特征提取,得到初步音频特征,初步音频特征包括各个音频数据段的梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficients,MFCC)。
其中,图5是本申请实施例涉及的一种梅尔频率倒谱系数提取流程图。如图5所示,进行梅尔频率倒谱系数的提取可以包括如下步骤:
首先,音频数据段经过预加重模块501进行预加重处理,预加重模块可以采用高通滤波器,其只允许高于某一频率的信号分量通过,而抑制低于该频率的信号分量,从而去除音频数据段中人的交谈声、脚步声和机械噪音等不必要的低频干扰,使音频数据段的频谱变得平坦。高通滤波器的数学表达式为:
H(z)=1-az -1
其中,a是修正系数,一般取值范围为0.95至0.97,z是音频数据段的音频信号。
将去除噪音后的音频数据段通过分帧加窗模块502进行分帧处理,得到不同音频帧对应的音频数据。
示意性的,本实施例中可以将包含512个数据点的音频数据划分为一帧,当音频数据的采样频率选取为16kHz时,一帧音频数据的时长为32ms。为了避免两帧数据之间的变化过大,同时也为了避免加窗处理后音频帧两端的数据丢失,本申请可以在每取完一帧数据后,向后滑动16ms再取下一帧数据,即相邻两帧数据重叠16ms。
由于分帧处理后的音频数据在后续特征提取时需要进行离散傅里叶变换,而一帧音频数据没有明显的周期性,即帧左端和帧右端不连续,经过傅里叶变换后与原始数据会产生误差,分帧越多误差越大,为了使分帧后的音频数据连续,且每一帧音频数据表现出周期函数的特征,本申请实施例所示的方案通过分帧加窗模块502进行分帧加窗处理。
在一种可能的实施方式中,终端可以采用汉明窗对音频帧进行加窗处理。其中,将每一帧音频数据乘以汉明窗函数,得到的音频数据就有了明显的周期性。汉明窗的函数形式为:
Figure PCTCN2022124453-appb-000001
其中,n为整数,n的取值范围为0至M,M是傅里叶变换的点数,示意性的,本实施例取512个数据点作为傅里叶变换的点数。
在一种可能的实现方式中,由于从音频信号在时域上的变换中很难得到其信号特性,通常需要把时域信号转换为频域上的能量分布来处理,因此终端先将音频帧数据输入傅里叶变换模块503进行傅里叶变换,然后将傅里叶变换后的音频帧数据输入能量谱计算模块504,计算音频帧数据的能量谱。为了将其能量谱转化为符合人耳听觉的梅尔谱,需要将能量谱输入梅尔滤波处理模块505进行滤波处理,滤波处理的数学表达式为:
Figure PCTCN2022124453-appb-000002
其中,f为傅里叶变换后的频点。
得到音频帧的梅尔谱之后,终端通过离散余弦变换(Discrete Cosine Transform,DCT)模块506对其取对数,得到的DCT系数即为MFCC特征。
示意性的,本申请实施例可以选取64维的MFCC特征,终端在实际提取特征时,音频数据的输入窗口长度可以选为200ms,而一帧信号的时间长度为32ms,相邻两帧数据之间有16ms的重叠部分,因此每一个200ms的输入窗口数据对应生成的特征为12*64的矩阵。
步骤404,对至少两个音频数据段各自的梅尔频率倒谱系数特征进行特征提取,获得至少两个音频数据段各自的声音局部特征。
在本申请实施例中,至少两个音频数据段各自经过上述的MFCC特征提取,得到各自对应的MFCC特征,对至少两个音频数据段各自对应的MFCC特征分别进行局部特征提取,可以得到至少两个音频数据段各自对应的声音局部特征。
在一种可能的实现方式中,终端可以通过第一卷积神经网络对各个音频数据段对应的MFCC特征进行局部特征提取,获取各个音频数据段的声音局部特征。
其中,第一卷积神经网络是用于提取各个音频数据段的局部特征的卷积神经网络(Convolutional Neural Network,CNN),利用该卷积神经网络进行局部特征提取可以去除各个音频特征数据段的MFCC特征中的冗余的特征信息。
示例性的,若目标时间段是2s,并且各个音频数据段均为200ms,则目标时间段内采集到的环境音数据可以分为10个200ms的音频数据段,各个音频数据段提取得到的MFCC特征分别经过第一卷积神经网络进行局部特征提取,得到各个音频数据段对应的声音局部特征。
步骤405,按照至少两个音频数据段的时域顺序,基于自注意力机制对至少两个音频数据段各自的声音局部特征进行处理,获得全局声音特征。
在本申请实施例中,终端按照时域上从先到后的顺序,对各个音频数据段对应的声音局部特征进行自注意力处理,得到目标时间段内的环境音数据对应的一个全局声音特征。
也就是说,该全局声音特征是基于各个音频数据段之间的时序关系,将各个音频数据段各自的声音局部特征进行融合后得到的一个全局特征。
在一种可能的实现方式中,终端可以按照至少两个音频数据段的时域顺序,基于自注意力机制对至少两个所述音频数据段各自的声音局部特征进行处理,获得至少两个所述音频数据段各自的注意力权重;然后,基于至少两个音频数据段各自的注意力权重,对至少两个音频数据段各自的声音局部特征进行加权处理,获得全局声音特征。
其中,终端可以将至少两个音频数据段各自的声音局部特征按照时域顺序输入第一自注意力网络,并且由第一自注意力网络进行时序相关的全局特征提取,得到全局声音特征。
示例性的,若目标时间段是2s,并且各个音频数据段均为200ms,终端将第一卷积神经网络输出的10个音频数据段分别对应的声音局部特征,按照时域顺序输入第一自注意力网络中,第一自注意力网络基于自注意力机制,确定各个音频数据段对应的注意力权重,将10个音频数据段的声音局部特征与各自对应的注意力权重进行加权处理,得到全局声音特征。
其中,基于至少两个音频数据段各自的注意力权重,可以对至少两个音频数据段各自的声音局部特征进行加权求和处理或者加权拼接处理,获得全局声音特征。
例如,以A、B、C这三个声音局部特征为例,假设基于自注意力机制得到的注意力权重为(0.2,0.6,0.2),一种情况下,将3个声音局部特征与各自对应的注意力权重进行加权求和处理,则全局声音特征D=A*0.2+B*0.6+C*0.2。
在另一种可能的实现方式中,终端也可以将各个音频数据段对应的注意力权重,与各个音频数据段对应的声音局部特征相乘后拼接,作为全局声音特征。
比如,当声音局部特征为A、B、C时,基于自注意力机制得到的注意力权重为(0.1,0.3,0.6),则全局声音特征D为(A*0.2,B*0.6,C*0.2)。
步骤406,对至少两个传感器数据段进行特征提取,获得至少两个传感器数据段各自的传感器局部特征。
在本申请实施例中,目标时间段内采集到的惯性传感器数据包含至少两个传感器数据段,对至少两个传感器数据段分别进行局部特征提取,可以得到至少两个传感器数据段各自对应的传感器局部特征。
在一种可能的实现方式中,终端可以通过第二卷积神经网络对各个传感器数据段进行局部特征提取,获取各个传感器数据段的传感器局部特征。
其中,第二卷积神经网络是用于提取各个传感器数据段的局部特征的卷积神经网络,利用该卷积神经网络进行局部特征提取可以去除各个传感器数据段中的冗余的特征信息。
示例性的,若目标时间段是2s,并且各个传感器数据段均为200ms,则目标时间段内采集到的惯性传感器数据可以分为10个200ms的传感器数据段,各个传感器数据段分别经过第二卷积神经网络进行局部特征提取,得到各个传感器数据段对应的传感器局部特征。
步骤407,按照至少两个传感器数据段的时域顺序,基于自注意力机制对至少两个传感器数据段各自的传感器局部特征进行处理,获得全局惯性传感器特征。
在本申请实施例中,终端按照时域上从先到后的顺序,对各个传感器数据段对应的声音局部特征进行自注意力处理,得到目标时间段内的惯性传感器数据对应的一个全局惯性传感器特征。
也就是说,该全局惯性传感器特征是基于各个传感器数据段之间的时序关系,将各个传感器数据段各自的传感器局部特征进行融合后得到的一个全局特征。
在一种可能的实现方式中,终端可以按照至少两个传感器数据段的时域顺序,基于自注意力机制对至少两个传感器数据段各自的传感器局部特征进行处理,获得至少两个传感器数据段各自的注意力权重;然后,基于至少两个传感器数据段各自的注意力权重,对至少两个传感器数据段各自的传感器局部特征进行加权处理,获得全局惯性传感器特征。
其中,终端可以将至少两个传感器数据段各自的传感器局部特征按照时域顺序输入第二自注意力网络,并且由第二自注意力网络进行时序相关的全局特征提取,得到全局惯性传感器特征。
示例性的,若目标时间段是2s,并且各个惯性传感器数据段均为200ms,终端将第二卷积神经网络输出的10个传感器数据段分别对应的惯性传感器局部特征,按照时域顺序输入第二自注意力网络中,第二自注意力网络基于自注意力机制,确定各个传感器数据段对应的注意力权重,将10个传感器数据段的惯性传感器局部特征与各自对应的注意力权重进行加权处理,得到全局惯性传感器特征。
其中,基于至少两个传感器数据段各自的注意力权重,可以对至少两个传感器数据段各自的惯性传感器局部特征进行加权求和处理或者加权拼接处理,获得全局惯性传感器特征。
例如,以X、Y、Z这三个惯性传感器局部特征为例,假设基于自注意力机制得到的注意力权重为(0.1,0.3,0.6),一种情况下,将3个惯性传感器局部特征与各自对应的注意力权重进行加权求和处理,则全局惯性传感器特征W=X*0.1+Y*0.3+Z*0.6。
在另一种可能的实现方式中,终端也可以将各个惯性传感器数据段对应的注意力权重,与各个惯性传感器数据段对应的传感器局部特征相乘后拼接,作为全局惯性传感器特征。
比如,当惯性传感器局部特征为X、Y、Z时,基于自注意力机制得到的注意力权重为(0.1,0.3,0.6),则全局惯性传感器特征W为(X*0.1,Y*0.3,Z*0.6)。
步骤408,将全局声音特征和全局惯性传感器特征进行拼接。
在本申请实施例中,终端将全局声音特征以及全局惯性传感器特征进行特征拼接,得到拼接后的全局特征。
在一种可能的实现方式中,全局声音特征包括目标时间段内的至少两个时间段各自对应的全局声音子特征;全局惯性传感器特征包括目标时间段内至少两个时间段各自对应的全局惯性传感器子特征。将目标时间段内的至少两个时间段各自对应的全局声音子特征以及全局惯性传感器子特征进行拼接。
示例性的,若全局声音特征为D,全局惯性传感器特征为W,则拼接后的全局特征可以是(D,W);若D为(A*0.2,B*0.6,C*0.2),则全局声音子特征可以为A*0.2、B*0.6以及C*0.2;若W为(X*0.1,Y*0.3,Z*0.6),则全局惯性传感器子特征可以为X*0.1、Y*0.3以及Z*0.6。
其中,全局声音子特征的维度数量与全局惯性传感器子特征的维度数量相同。
在一种可能的实现方式中,全局声音子特征各自的维度数量由第一自注意力网络的输出特征维度确定,全局惯性传感器子特征各自的维度数量由第二自注意力网络的输出特征维度确定。
示例性的,第一卷积神经网络提取各个音频数据段的声音局部特征,若该声音局部特征的特征维度为N,且目标时间段中包含有10个音频数据段,则可以由第一卷积神经网络输出10个局部特征构成的10×N的声音局部特征向量,将10×N的声音局部特征向量输入第一自注意力网络中,提取得到10×N的全局声音特征向量。第二卷积神经网络提取各个传感器数据段的传感器局部特征,若该传感器局部特征的特征维度为N,且目标时间段中包含有10个传感器数据段,则可以由第二卷积神经网络输出10个局部特征构成的10×N的传感器局部特征向量,将10×N的传感器局部特征向量输入第二自注意力网络中,提取得到10×N的全局惯性传感器特征向量。将10×N的全局声音特征向量以及10×N的全局惯性传感器特征向量按行进行堆叠,得到20×N的拼接后的全局特征向量,将20×N的拼接后的全局特征向量输入第三自注意力网络中,最终得到20×N的融合特征向量,其中,20为时间长度,20×N的融合特征向量指示20个N维向量特征。
步骤409,基于自注意力机制对拼接后的全局声音特征和全局惯性传感器特征进行处理,获得全局声音特征和全局惯性传感器特征各自的注意力权重,基于全局声音特征和全局惯性传感器特征各自的注意力权重,获取融合特征。
在本申请实施例中,终端通过自注意力机制对拼接后的全局特征进行处理,得到全局特征中全局声音特征以及全局惯性传感器特征这两种模态各自对应的注意力权重。终端基于两种模态分别对应的注意力权重,对两种模态的全局特征进行特征融合,得到融合特征。
在一种可能的实现方式中,基于自注意力机制对拼接后的全局声音特征和全局惯性传感器特征进行处理,获得全局声音子特征各自的注意力权重和全局惯性传感器子特征各自的注意力权重;基于全局声音子特征各自的注意力权重和全局惯性传感器子特征各自的注意力权重,获取融合特征。
其中,各个时间段对应的全局声音子特征可以是各个音频数据段对应的声音局部特征在基于自注意力机制进行处理后得到的特征。各个时间段对应的全局惯性传感器子特征可以是各个传感器数据段对应的传感器局部特征在基于自注意力机制进行处理后得到的特征。
拼接后的全局声音特征和全局惯性传感器特征通过第三自注意力网络,可以提取不同模态之间的关系,并且考虑到了时序对融合特征的影响,进行全局特征提取得到融合特征。
在一种可能的实现方式中,终端基于全局声音特征和全局惯性传感器特征各自的注意力权重,对全局声音特征和全局惯性传感器特征进行加权求和或者加权平均,获得融合特征。
其中,拼接后的全局声音特征和全局惯性传感器特征通过第三自注意力网络,可以提取不同模态之间的关系,进行全局特征提取得到融合特征。
示例性的,基于自注意力机制可以确定全局声音特征以及全局惯性传感器特征各自对应的注意力权重分别为0.2以及0.8,若拼接后的全局特征是(D,W),通过上述方法进行加权求和得到的融合特征为D*0.2+W*0.8,通过上述方法进行加权平均得到加权平均结果为E=(D*0.2+W*0.8)/2,得到的融合特征为(E,E)。
在一种可能的实现方式中,基于自注意力机制对拼接后的全局声音特征和全局惯性传感器特征进行处理,获得全局声音子特征各自的注意力权重和全局惯性传感器子特征各自的注意力权重;基于全局声音子特征各自的注意力权重和全局惯性传感器子特征各自的注意力权重,获取融合特征。
在一种可能的实现方式中,基于全局声音子特征和全局惯性传感器子特征各自的注意力权重,对全局声音子特征和全局惯性传感器子特征进行加权求和或者加权平均,获得融合特征。
示例性的,若拼接后的全局特征是(A,B,X,Y),基于自注意力机制可以确定全局声 音子特征以及全局惯性传感器子特征各自对应的注意力权重分别为0.2、0.3、0.1以及0.5,通过上述方法乘以各自注意力权重之后进行求和,得到的融合特征为A*0.2+B*0.3+X*0.1+Y*0.5。通过上述方法乘以各自注意力权重之后进行求和平均,加权平均结果为E=(A*0.2+B*0.3+X*0.1+Y*0.5)/4。由此可见,得到的融合特征为(E,E,E,E)。
在另一种可能的实现方式中,终端也可以基于全局声音特征和全局惯性传感器特征各自的注意力权重,对全局声音特征和全局惯性传感器特征乘以各自的注意力权重后,将乘以各自注意力权重之后的两种特征进行拼接后作为融合特征。
示例性的,基于自注意力机制可以确定全局声音特征以及全局惯性传感器特征各自对应的注意力权重分别为0.2以及0.8,若拼接后的全局特征是(D,W),则融合特征为(D*0.2,W*0.6)。
例如,若拼接后的全局特征是(A*0.2,B*0.6,C*0.2,X*0.1,Y*0.3,Z*0.6),通过上述方法乘以各自注意力权重之后的两种特征进行拼接得到的融合特征为(A*0.2*0.2,B*0.6*0.2,C*0.2*0.2,X*0.1*0.8,Y*0.3*0.8,Z*0.6*0.8)。
可选的,在本申请实施例中,终端基于两种模态分别对应的注意力权重,对两种模态的全局特征进行特征融合时,还可以进一步结合环境音数据的平滑参数进行特征融合。
在一种可能的实现方式中,对于环境音数据中的至少两个音频数据段,终端可以获取至少两个音频数据段各自的音量均值;基于至少两个音频数据段各自的音量均值,获取环境音数据的音量均值;然后根据环境音数据的音量均值,以及至少两个音频数据段各自的音量均值,获取环境音数据的音量的平滑参数,其中,该平滑参数用于指示环境音数据的音量的平滑程度。在对两种模态的全局特征进行特征融合之前,终端可以根据环境音数据的音量的平滑参数,获取全局声音特征的调整系数,通过调整系数与全局声音特征相乘,得到调整后的全局声音特征,后续在对两种模态的全局特征进行特征融合时,可以将调整后的全局声音特征和全局惯性传感器特征进行拼接。
其中,该调整系数与环境音数据的音量的平滑参数呈负相关,也就是说,环境音数据的音量越平滑,平滑参数越小,调整系数越大;相应的,环境音数据的音量的平滑参数越大,
调整系数越小。可选的,上述平滑参数可以是标准差或者方差等表示数据集的离散程度的参数。
由于公共交通工具在运行过程中,车厢内通常会产生不规律的环境噪音,比如突然发生的吵嚷声等等,这些不规律的环境噪音可能会对全局声音特征的准确性造成影响,对此,本申请实施例所示的方案中,终端在对全局声音特征和全局惯性传感器特征进行融合之前,可以先根据环境音数据的音量的平滑程度对全局声音特征进行抑制或者增强,从而动态的调节全局声音特征在融合特征中的比重。比如,环境音数据的平滑参数较高,说明环境音数据中包含的不规律噪音越多,对后续到站检测的影响较大,此时,可以通过一个较小的调整系数(比如0.9)对全局声音特征进行抑制,以降低全局声音特征在融合特征中的比重;反之,环境音数据的平滑参数较低,说明环境音数据中包含的不规律噪音越少,对后续到站检测的影响较小,此时,可以通过一个较大的调整系数(比如1.1)对全局声音特征进行增强,以提高全局声音特征在融合特征中的比重。
通过上述结合环境音数据的平滑参数进行特征融合的方案,可以对全局声音特征在融合特征中的比重进行灵活的调整,进一步提高后续到站提醒判断的准确性。
步骤410,基于融合特征获取交通运行信息。
其中,交通运行信息用于指示公共交通工具在目标时间段内的运行状态。
在一种可能的实现方式中,终端可以通过全连接网络以及分类器对融合特征进行分类处理,输出公共交通工具的交通运行信息,从确定公共交通工具在目标时间段内的运行状态。
示例性的,图6是本申请实施例涉及的一种分类模型架构图。如图6所示,终端中存储有该分类模型,用于基于终端采集到的环境音数据以及惯性传感器数据,判断公共交通工具 的运行状态,该分类模型中包括第一卷积网络层61、第二卷积网络层62、第一自注意力网络层63、第二自注意力网络层64、第三自注意力网络层65、全连接网络层66以及分类器67。第一卷积网络层61用于对环境音数据进行局部特征提取,2s的环境音数据分为10个200ms的音频数据段,对10个200ms的音频数据段通过特征提取模块计算MFCC特征提取,然后依次将提取到的MFCC特征输入第一卷积神经网络层61,进行局部特征提取得到各个音频数据段对应的声音局部特征,然后,按照时域顺序将各个声音局部特征输入第一自注意力网络层63,通过第一自注意力网络层63的时序相关的全局特征提取,得到全局声音特征。同时,第二卷积网络层62用于对惯性传感器数据进行局部特征提取,2s的惯性传感器数据分为10个200ms的传感器数据段,将10个200ms的传感器数据段依次输入第二卷积神经网络层62,进行局部特征提取得到各个传感器数据段对应的惯性传感器局部特征,然后,按照时域顺序将各个惯性传感器局部特征输入第二自注意力网络层64,通过第二自注意力网络层64的时序相关的全局特征提取,得到全局惯性传感器特征。将2s内得到的全局惯性传感器特征以及全局声音特征输入第三自注意力网络层65,该第三自注意力网络层65用于对多模态进行自注意力权重分配提取全局特征,由第三自注意力网络层65输出融合特征,将融合特征输入全连接网络66以及分类器67中,输出该融合特征对应的交通运行信息。
其中,分类器67可以采取不同算法的分类器,比如SVM(Support Vector Machine,支持向量机)、决策树分类模型算法以及二分类模型算法等。
在一种可能的实现方式中,上述分类模型的训练过程可以如下:模型训练设备获取样本时间段内的样本环境音数据以及样本惯性传感器数据;将样本环境音数据以及样本惯性传感器数据输入分类模型,获得分类模型输出的预测交通运行信息;基于预测交通运行信息,以及样本环境音数据以及样本惯性传感器数据对应的交通运行信息标签获取损失函数值;基于损失函数值对分类模型的模型参数进行更新。
步骤411,基于交通运行信息执行到站提醒。
在本申请实施例中,终端通过获取到的目标时间段内的交通运行信息确定目标时间段内的公共交通工具的运行情况,基于获取到的公共交通工具的运行情况执行到站提醒。
其中,交通运行信息用于指示公共交通工具在目标时间段内的运行状态是否为停止状态。在交通运行信息指示公共交通工具在目标时间段内的运行状态为停止状态的情况下,执行到站提醒。
在一种可能的实现方式中,在交通运行信息指示公共交通工具在目标时间段内的运行状态为停止状态的情况下,终端可以获取公共交通工具的当前位置;在公共交通工具的当前位置与目标路线上的指定站点相匹配的情况下,执行到站提醒。
其中,指定站点可以是目标路线上的目的站点或者换乘站点。
在获取公共交通工具的当前位置时,若公共交通工具为地铁或高铁等轨道交通工具,则终端可以结合本次乘坐公共交通工具过程中的历次停止情况,确定当前位置对应的站点。
示例性的,响应于检测到公共交通工具第i次停止(比如,获取到公共交通工具处于停止状态的持续时长大于等于第一阈值,则认为公共交通工具停止),终端确定公共交通工具在目标线路上的所处站点为起始站点之后的第i个站点;响应于第i个站点或者第i+1个站点为指定站点,确定指定站点的站点种类,站点种类包括目的站点以及换乘站点;基于指定站点的站点种类,获取与站点种类对应的提醒信息;基于提醒信息进行到站提醒。
或者,终端也可以在检测到公共交通工具的运行状态为停止状态时,获取公共交通工具的当前位置,结合公共交通工具的当前位置以及目标路线中各个站点的位置,确定与公共交通工具的当前位置相对应的站点。
在获取公共交通工具的当前位置时,终端还可以通过定位系统获取终端的当前位置,或者,可以通过惯性传感器数据确定终端的当前位置。
由于公共交通工具可能在非站点的位置停车,比如,公交车可能会因为等红灯等原因停 车,即便是轨道交通工具,也可能会因为路线调度或者路线故障等原因在非站点位置停车,因此,仅通过公共交通工具的停止次数无法准确的确定公共交通工具所在的站点。对此,本申请实施例所示的方法还可以通过定位系统获取终端的当前位置(比如通过卫星定位、蜂窝网络定位、无线接入点定位等方式获取当前位置);可选的,公共交通工具的目标路线上可能存在无法通过定位系统进行定位的位置(比如无信号的地下轨道中),此时,终端还可以从上一次通过定位系统获取到的终端位置开始,通过惯性传感器数据确定终端的移动轨迹,并结合终端内置的地图数据以及上述移动轨迹,确定终端的当前位置。
在一种可能的实现方式中,在连续N次获取到的交通运行信息指示公共交通工具的运行状态为停止状态的情况下,执行到站提醒。
示例性的,图7是本申请实施例涉及的一种到站判断方法的流程图,如图7所示。首先执行步骤71,采集目标时间段内的环境音数据,然后执行步骤72,将环境音数据进行音频特征前处理,提取环境音数据对应的各个音频数据段的MFCC特征,即上述实施例中的每一个窗口200ms的数据对应的特征为12*64的矩阵,然后执行步骤73,将提取到的各个音频数据段的MFCC特征与采集到的惯性传感器数据输入卷积神经网络以及自注意力网络进行多模态特征融合的网络模型结构,其中,环境音数据和惯性传感器数据均为2s,自注意力网络具有提取时序特征的特性,因此将2s数据各拆分为10个200ms的数据段,每200ms为一个独立帧的数据,CNN提取独立帧的数据局部特征,之后形成10个局部特征,再输入到自注意力网络中提取时序相关特征,可以更好的提取数据整体特征,从而结合了CNN与自注意力网络各自的优势,即CNN更擅长提取局部特征,自注意力网络更擅长提取全局时序特征。得到融合特征后,判断目标时间段内是否处于停止状态,若处于停止状态则执行步骤74,持续检测公共交通工具的运行状态,比如,若地铁到站停止时间一般有20s以上,本申请实施例输入模型的数据窗口长度为2秒,可以持续检测地铁与运行状态,若连续检测到5次地铁处于停止状态,则认为地铁到达了一个站点。
其中,若公共交通工具进入站点需要连续处于停止状态20s或以上,上述实施例中的目标时间段为2s,在连续5次获取到交通运行信息指示公共交通工具的运行状态为停止状态的情况下,确定当前所处站点是否为指定站点,若确定当前所处站点为指定站点,则执行与指定站点对应的到站提醒。
比如,若终端判断用户到达目标站点,则终端提醒用户即将到站注意下车,并可以推送该指定站点附近的地图信息,若终端判断用户到达换乘站点,则终端提醒用户进行换乘,并可以提醒换乘车辆的首末班时间信息。
综上所述,本申请实施例中,通过实时采集环境音数据以及惯性传感器数据,并分别对环境音数据以及惯性传感器数据进行时序相关的全局特征提取,然后基于全局声音特征以及全局惯性传感器特征,结合不同模态之间的关系进行融合特征提取,避免了仅通过单一模态特征进行公共交通工具的运行状态判断时,受到外界影响导致准确性较差的情况,由于提高了公共交通工具的运行状态判断的准确性,进而提高了到站提醒的准确性。
图8示出了本申请一个示例性实施例提供的到站提醒装置的结构框图。该到站提醒装置用于执行上述图2或图4所示的方案中,由终端执行的全部或者部分步骤,该到站提醒装置包括:
数据获取模块810,用于获取目标时间段内的环境音数据和所述目标时间段内的惯性传感器数据;
第一特征提取模块820,用于基于所述环境音数据的时序对所述环境音数据进行特征提取,获得全局声音特征;
第二特征提取模块830,用于基于所述惯性传感器数据的时序对所述惯性传感器数据进行特征提取,获得全局惯性传感器特征;
特征融合模块840,用于基于自注意力机制对所述全局声音特征和所述全局惯性传感器特征进行融合处理,获得融合特征;
信息获取模块850,用于基于所述融合特征获取交通运行信息;所述交通运行信息用于指示公共交通工具在所述目标时间段内的运行状态;
提醒模块860,用于基于所述交通运行信息执行到站提醒。
在一种可能的实现方式中,所述特征融合模块840,包括:
特征拼接子模块,用于将所述全局声音特征和所述全局惯性传感器特征进行拼接;
权重获取子模块,用于基于自注意力机制对拼接后的所述全局声音特征和所述全局惯性传感器特征进行处理,获得所述全局声音特征和所述全局惯性传感器特征各自的注意力权重;
特征融合子模块,用于基于所述全局声音特征和所述全局惯性传感器特征各自的注意力权重,获取所述融合特征。
在一种可能的实现方式中,所述全局声音特征包括所述目标时间段内的至少两个时间段各自对应的全局声音子特征;所述全局惯性传感器特征包括所述目标时间段内至少两个时间段各自对应的全局惯性传感器子特征;
所述特征拼接子模块,包括:
拼接单元,用于将所述目标时间段内的至少两个时间段各自对应的所述全局声音子特征以及所述全局惯性传感器子特征进行拼接;所述全局声音子特征的维度数量,与所述全局惯性传感器子特征的维度数量相同;
所述权重获取子模块,包括:
权重单元,用于基于自注意力机制对拼接后的所述全局声音特征和所述全局惯性传感器特征进行处理,获得所述全局声音子特征各自的注意力权重和所述全局惯性传感器子特征各自的注意力权重;
所述特征融合子模块,包括:
融合特征获取单元,用于基于所述全局声音子特征各自的注意力权重和所述全局惯性传感器子特征各自的注意力权重,获取所述融合特征。
在一种可能的实现方式中,所述融合特征获取单元,用于,
基于所述全局声音子特征各自的注意力权重和所述全局惯性传感器子特征各自的注意力权重,对所述全局声音子特征和所述全局惯性传感器子特征进行加权求和或者加权平均,获得所述融合特征。
在一种可能的实现方式中,所述环境音数据包含至少两个音频数据段;
所述第一特征提取模块820,包括:
第一提取子模块,用于对至少两个所述音频数据段分别进行音频特征提取,获得至少两个所述音频数据段各自的梅尔频率倒谱系数特征;
第一局部获取子模块,用于对至少两个所述音频数据段各自的梅尔频率倒谱系数特征进行特征提取,获得至少两个所述音频数据段各自的声音局部特征;
第一全局获取子模块,用于按照至少两个所述音频数据段的时域顺序,基于自注意力机制对至少两个所述音频数据段各自的声音局部特征进行处理,获得所述全局声音特征。
在一种可能的实现方式中,所述第一全局获取子模块,包括:
第一权重获取单元,用于按照至少两个所述音频数据段的时域顺序,基于自注意力机制对至少两个所述音频数据段各自的声音局部特征进行处理,获得至少两个所述音频数据段各自的注意力权重;
第一全局获取单元,用于基于至少两个所述音频数据段各自的注意力权重,对至少两个所述音频数据段各自的声音局部特征进行加权处理,获得所述全局声音特征。
在一种可能的实现方式中,所述惯性传感器数据包含至少两个传感器数据段;
所述第二特征提取模块830,包括:
第二局部获取子模块,用于对至少两个所述传感器数据段进行特征提取,获得至少两个所述传感器数据段各自的传感器局部特征;
第二全局获取子模块,用于按照至少两个所述传感器数据段的时域顺序,基于自注意力机制对至少两个所述传感器数据段各自的传感器局部特征进行处理,获得所述全局惯性传感器特征。
在一种可能的实现方式中,所述第二全局获取子模块,包括:
第二权重获取单元,用于按照至少两个所述传感器数据段的时域顺序,基于自注意力机制对至少两个所述传感器数据段各自的传感器局部特征进行处理,获得至少两个所述传感器数据段各自的注意力权重;
第二全局获取单元,用于基于至少两个所述传感器数据段各自的注意力权重,对至少两个所述传感器数据段各自的传感器局部特征进行加权处理,获得所述全局惯性传感器特征。
在一种可能的实现方式中,所述交通运行信息用于指示公共交通工具在所述目标时间段内的运行状态是否为停止状态;
所述提醒模块860,包括:
提醒子模块,用于在所述交通运行信息指示公共交通工具在所述目标时间段内的运行状态为停止状态的情况下,执行到站提醒。
在一种可能的实现方式中,所述提醒子模块,包括:
位置获取单元,用于在所述交通运行信息指示公共交通工具在所述目标时间段内的运行状态为停止状态的情况下,获取所述公共交通工具的当前位置;
提醒单元,用于在所述公共交通工具的当前位置与目标路线上的指定站点相匹配的情况下,执行到站提醒;所述指定站点是所述目标路线上的目的站点或者换乘站点。
在一种可能的实现方式中,所述装置还包括:
界面展示模块,用于获取目标时间段内的环境音数据和所述目标时间段内的惯性传感器数据之前,展示路线设置界面;
目标路线获取模块,用于根据用户在所述路线设置界面中设置的起始站点和目的站点,,获取所述目标路线。
在一种可能的实现方式中,所述终端还包括:
目标路线获取模块,用于根据用户的历史移动轨迹进行路线预测,获取所述目标路线。
在一种可能的实现方式中,所述提醒子模块,包括:
到站提醒单元,用于在连续N次获取到的所述交通运行信息指示公共交通工具的运行状态为停止状态的情况下,执行到站提醒。
综上所述,本申请实施例中,通过实时采集环境音数据以及惯性传感器数据,并分别对环境音数据以及惯性传感器数据进行时序相关的全局特征提取,然后基于全局声音特征以及全局惯性传感器特征,结合不同模态之间的关系进行融合特征提取,避免了仅通过单一模态特征进行公共交通工具的运行状态判断时,受到外界影响导致准确性较差的情况,由于提高了公共交通工具的运行状态判断的准确性,进而提高了到站提醒的准确性。
图9示出了本申请一个示例性实施例提供的终端的结构方框图。该终端可以是智能手机、平板电脑、电子书、便携式个人计算机等安装并运行有应用程序的电子设备。本申请中的终端可以包括一个或多个如下部件:处理器910、存储器920和屏幕930。
处理器910可以包括一个或者多个处理核心。处理器910利用各种接口和线路连接整个终端内的各个部分,通过运行或执行存储在存储器920内的指令、程序、代码集或指令集,以及调用存储在存储器920内的数据,执行终端的各种功能和处理数据。可选地,处理器910可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的 至少一种硬件形式来实现。处理器910可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责屏幕930所需要显示的内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器910中,单独通过一块通信芯片进行实现。
存储器920可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory,ROM)。可选地,该存储器920包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器920可用于存储指令、程序、代码、代码集或指令集。存储器920可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施例的指令等,该操作系统可以是安卓(Android)系统(包括基于Android系统深度开发的系统)、IOS系统(包括基于IOS系统深度开发的系统)或其它系统。存储数据区还可以存储终端在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。
屏幕930可以为电容式触摸显示屏,该电容式触摸显示屏用于接收用户使用手指、触摸笔等任何适合的物体在其上或附近的触摸操作,以及显示各个应用程序的用户界面。除此之外,本领域技术人员可以理解,上述附图所示出的终端的结构并不构成对终端的限定,终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有至少一条计算机指令,该至少一条计算机指令由处理器加载并执行以实现如上各个实施例所述的到站提醒方法。
根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。终端的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该终端执行上述方面的各种可选实现方式中提供的到站提醒方法。

Claims (17)

  1. 一种到站提醒方法,所述方法由终端执行,所述方法包括:
    获取目标时间段内的环境音数据和所述目标时间段内的惯性传感器数据;
    基于所述环境音数据的时序对所述环境音数据进行特征提取,获得全局声音特征;
    基于所述惯性传感器数据的时序对所述惯性传感器数据进行特征提取,获得全局惯性传感器特征;
    基于自注意力机制对所述全局声音特征和所述全局惯性传感器特征进行融合处理,获得融合特征;
    基于所述融合特征获取交通运行信息;所述交通运行信息用于指示公共交通工具在所述目标时间段内的运行状态;
    基于所述交通运行信息执行到站提醒。
  2. 根据权利要求1所述的方法,所述基于自注意力机制对所述全局声音特征和所述全局惯性传感器特征进行融合处理,获得融合特征,包括:
    将所述全局声音特征和所述全局惯性传感器特征进行拼接;
    基于自注意力机制对拼接后的所述全局声音特征和所述全局惯性传感器特征进行处理,获得所述全局声音特征和所述全局惯性传感器特征各自的注意力权重;
    基于所述全局声音特征和所述全局惯性传感器特征各自的注意力权重,获取所述融合特征。
  3. 根据权利要求2所述的方法,所述全局声音特征包括所述目标时间段内的至少两个时间段各自对应的全局声音子特征;所述全局惯性传感器特征包括所述目标时间段内至少两个时间段各自对应的全局惯性传感器子特征;
    所述将所述全局声音特征和所述全局惯性传感器特征进行拼接,包括:
    将所述目标时间段内的至少两个时间段各自对应的所述全局声音子特征以及所述全局惯性传感器子特征进行拼接;所述全局声音子特征的维度数量,与所述全局惯性传感器子特征的维度数量相同;
    所述基于自注意力机制对拼接后的所述全局声音特征和所述全局惯性传感器特征进行处理,获得所述全局声音特征和所述全局惯性传感器特征各自的注意力权重,包括:
    基于自注意力机制对拼接后的所述全局声音特征和所述全局惯性传感器特征进行处理,获得所述全局声音子特征各自的注意力权重和所述全局惯性传感器子特征各自的注意力权重;
    所述基于所述全局声音特征和所述全局惯性传感器特征各自的注意力权重,获取所述融合特征,包括:
    基于所述全局声音子特征各自的注意力权重和所述全局惯性传感器子特征各自的注意力权重,获取所述融合特征。
  4. 根据权利要求3所述的方法,所述基于所述全局声音子特征各自的注意力权重和所述全局惯性传感器子特征各自的注意力权重,获取所述融合特征,包括:
    基于所述全局声音子特征各自的注意力权重和所述全局惯性传感器子特征各自的注意力权重,对所述全局声音子特征和所述全局惯性传感器子特征进行加权求和或者加权平均,获得所述融合特征。
  5. 根据权利要求1所述的方法,所述环境音数据包含至少两个音频数据段;所述基于所 述环境音数据的时序,对所述环境音数据进行特征提取获得全局声音特征,包括:
    对至少两个所述音频数据段分别进行音频特征提取,获得至少两个所述音频数据段各自的梅尔频率倒谱系数特征;
    对至少两个所述音频数据段各自的梅尔频率倒谱系数特征进行特征提取,获得至少两个所述音频数据段各自的声音局部特征;
    按照至少两个所述音频数据段的时域顺序,基于自注意力机制对至少两个所述音频数据段各自的声音局部特征进行处理,获得所述全局声音特征。
  6. 根据权利要求5所述的方法,所述按照至少两个所述音频数据段的时域顺序,基于自注意力机制对至少两个所述音频数据段各自的声音局部特征进行处理,获得所述全局声音特征,包括:
    按照至少两个所述音频数据段的时域顺序,基于自注意力机制对至少两个所述音频数据段各自的声音局部特征进行处理,获得至少两个所述音频数据段各自的注意力权重;
    基于至少两个所述音频数据段各自的注意力权重,对至少两个所述音频数据段各自的声音局部特征进行加权处理,获得所述全局声音特征。
  7. 根据权利要求1所述的方法,所述惯性传感器数据包含至少两个传感器数据段;所述基于所述惯性传感器数据的时序,对所述惯性传感器数据进行特征提取获得全局惯性传感器特征,包括:
    对至少两个所述传感器数据段进行特征提取,获得至少两个所述传感器数据段各自的传感器局部特征;
    按照至少两个所述传感器数据段的时域顺序,基于自注意力机制对至少两个所述传感器数据段各自的传感器局部特征进行处理,获得所述全局惯性传感器特征。
  8. 根据权利要求7所述的方法,所述按照至少两个所述传感器数据段的时域顺序,基于自注意力机制对至少两个所述传感器数据段各自的传感器局部特征进行处理,获得所述全局惯性传感器特征,包括:
    按照至少两个所述传感器数据段的时域顺序,基于自注意力机制对至少两个所述传感器数据段各自的传感器局部特征进行处理,获得至少两个所述传感器数据段各自的注意力权重;
    基于至少两个所述传感器数据段各自的注意力权重,对至少两个所述传感器数据段各自的传感器局部特征进行加权处理,获得所述全局惯性传感器特征。
  9. 根据权利要求1至8任一所述的方法,所述交通运行信息用于指示公共交通工具在所述目标时间段内的运行状态是否为停止状态;
    所述基于所述交通运行信息执行到站提醒,包括:
    在所述交通运行信息指示公共交通工具在所述目标时间段内的运行状态为停止状态的情况下,执行到站提醒。
  10. 根据权利要求9所述的方法,所述在所述交通运行信息指示公共交通工具在所述目标时间段内的运行状态为停止状态的情况下,执行到站提醒,包括:
    在所述交通运行信息指示公共交通工具在所述目标时间段内的运行状态为停止状态的情况下,获取所述公共交通工具的当前位置;
    在所述公共交通工具的当前位置与目标路线上的指定站点相匹配的情况下,执行到站提醒;所述指定站点是所述目标路线上的目的站点或者换乘站点。
  11. 根据权利要求10所述的方法,所述获取目标时间段内的环境音数据和所述目标时间 段内的惯性传感器数据之前,还包括:
    展示路线设置界面;根据用户在所述路线设置界面中设置的起始站点和目的站点,获取所述目标路线。
  12. 根据权利要求10所述的方法,所述获取目标时间段内的环境音数据和所述目标时间段内的惯性传感器数据之前,还包括:
    根据用户的历史移动轨迹进行路线预测,获取所述目标路线。
  13. 根据权利要求9所述的方法,所述在所述交通运行信息用于指示公共交通工具在所述目标时间段内的运行状态为停止状态的情况下,执行到站提醒,包括:
    在连续N次获取到的所述交通运行信息指示公共交通工具的运行状态为停止状态的情况下,执行到站提醒。
  14. 一种到站提醒装置,所述装置包括:
    数据获取模块,用于获取目标时间段内的环境音数据和所述目标时间段内的惯性传感器数据;
    第一特征提取模块,用于基于所述环境音数据的时序对所述环境音数据进行特征提取,获得全局声音特征;
    第二特征提取模块,用于基于所述惯性传感器数据的时序对所述惯性传感器数据进行特征提取,获得全局惯性传感器特征;
    特征融合模块,用于基于自注意力机制对所述全局声音特征和所述全局惯性传感器特征进行融合处理,获得融合特征;
    信息获取模块,用于基于所述融合特征获取交通运行信息;所述交通运行信息用于指示公共交通工具在所述目标时间段内的运行状态;
    提醒模块,用于基于所述交通运行信息执行到站提醒。
  15. 一种终端,所述终端包括处理器和存储器;所述存储器中存储有至少一条计算机指令,所述至少一条计算机指令由所述处理器加载并执行以实现如权利要求1至13任一所述的到站提醒方法。
  16. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条计算机指令,所述计算机指令由处理器加载并执行以实现如权利要求1至13任一所述的到站提醒方法。
  17. 一种计算机程序产品,所述计算机程序产品包括计算机指令,所述计算机指令由终端的处理器执行,使得所述终端执行如权利要求1至13任一所述的到站提醒方法。
PCT/CN2022/124453 2021-10-26 2022-10-10 到站提醒方法、装置、终端、存储介质及程序产品 WO2023071768A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111249921.8A CN113984078B (zh) 2021-10-26 2021-10-26 到站提醒方法、装置、终端及存储介质
CN202111249921.8 2021-10-26

Publications (1)

Publication Number Publication Date
WO2023071768A1 true WO2023071768A1 (zh) 2023-05-04

Family

ID=79741844

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/124453 WO2023071768A1 (zh) 2021-10-26 2022-10-10 到站提醒方法、装置、终端、存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN113984078B (zh)
WO (1) WO2023071768A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117334072A (zh) * 2023-12-01 2024-01-02 青岛城运数字科技有限公司 公交车辆到站时刻预测方法和装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113984078B (zh) * 2021-10-26 2024-03-08 上海瑾盛通信科技有限公司 到站提醒方法、装置、终端及存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102770892A (zh) * 2009-12-21 2012-11-07 佳明瑞士有限责任公司 交通停车检测
US20170059329A1 (en) * 2015-08-25 2017-03-02 Siemens Industry, Inc. System and method for determining a location of a vehicle relative to a stopping point
CN110718089A (zh) * 2019-10-15 2020-01-21 Oppo(重庆)智能科技有限公司 一种出行服务方法、装置及计算机可读存储介质
CN111009261A (zh) * 2019-12-10 2020-04-14 Oppo广东移动通信有限公司 到站提醒方法、装置、终端及存储介质
CN111353467A (zh) * 2020-03-12 2020-06-30 Oppo广东移动通信有限公司 行驶状态识别方法、装置、终端及存储介质
CN111402617A (zh) * 2020-03-12 2020-07-10 Oppo广东移动通信有限公司 站点信息确定方法、装置、终端及存储介质
CN112556692A (zh) * 2020-11-27 2021-03-26 绍兴市北大信息技术科创中心 一种基于注意力机制的视觉和惯性里程计方法和系统
CN113380043A (zh) * 2021-08-12 2021-09-10 深圳市城市交通规划设计研究中心股份有限公司 一种基于深度神经网络计算的公交到站时间预测方法
CN113984078A (zh) * 2021-10-26 2022-01-28 上海瑾盛通信科技有限公司 到站提醒方法、装置、终端及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101496730B1 (ko) * 2013-03-07 2015-02-27 최용석 이동 영상 실시간 전송 및 모니터링 시스템
CN110570858A (zh) * 2019-09-19 2019-12-13 芋头科技(杭州)有限公司 语音唤醒方法、装置、智能音箱和计算机可读存储介质
CN110660201B (zh) * 2019-09-23 2021-07-09 Oppo广东移动通信有限公司 到站提醒方法、装置、终端及存储介质
CN112651267A (zh) * 2019-10-11 2021-04-13 阿里巴巴集团控股有限公司 识别方法、模型训练、系统及设备
CN110880328B (zh) * 2019-11-20 2022-11-15 Oppo广东移动通信有限公司 到站提醒方法、装置、终端及存储介质
CN111079547B (zh) * 2019-11-22 2022-07-19 武汉大学 一种基于手机惯性传感器的行人移动方向识别方法
CN111383628B (zh) * 2020-03-09 2023-08-25 第四范式(北京)技术有限公司 一种声学模型的训练方法、装置、电子设备及存储介质
CN112489635B (zh) * 2020-12-03 2022-11-11 杭州电子科技大学 一种基于增强注意力机制的多模态情感识别方法
CN113177133B (zh) * 2021-04-23 2024-03-29 深圳依时货拉拉科技有限公司 一种图像检索方法、装置、设备及存储介质
CN113810539B (zh) * 2021-09-17 2023-03-24 上海瑾盛通信科技有限公司 到站提醒的方法、装置、终端及存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102770892A (zh) * 2009-12-21 2012-11-07 佳明瑞士有限责任公司 交通停车检测
US20170059329A1 (en) * 2015-08-25 2017-03-02 Siemens Industry, Inc. System and method for determining a location of a vehicle relative to a stopping point
CN110718089A (zh) * 2019-10-15 2020-01-21 Oppo(重庆)智能科技有限公司 一种出行服务方法、装置及计算机可读存储介质
CN111009261A (zh) * 2019-12-10 2020-04-14 Oppo广东移动通信有限公司 到站提醒方法、装置、终端及存储介质
CN111353467A (zh) * 2020-03-12 2020-06-30 Oppo广东移动通信有限公司 行驶状态识别方法、装置、终端及存储介质
CN111402617A (zh) * 2020-03-12 2020-07-10 Oppo广东移动通信有限公司 站点信息确定方法、装置、终端及存储介质
CN112556692A (zh) * 2020-11-27 2021-03-26 绍兴市北大信息技术科创中心 一种基于注意力机制的视觉和惯性里程计方法和系统
CN113380043A (zh) * 2021-08-12 2021-09-10 深圳市城市交通规划设计研究中心股份有限公司 一种基于深度神经网络计算的公交到站时间预测方法
CN113984078A (zh) * 2021-10-26 2022-01-28 上海瑾盛通信科技有限公司 到站提醒方法、装置、终端及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117334072A (zh) * 2023-12-01 2024-01-02 青岛城运数字科技有限公司 公交车辆到站时刻预测方法和装置
CN117334072B (zh) * 2023-12-01 2024-02-23 青岛城运数字科技有限公司 公交车辆到站时刻预测方法和装置

Also Published As

Publication number Publication date
CN113984078B (zh) 2024-03-08
CN113984078A (zh) 2022-01-28

Similar Documents

Publication Publication Date Title
WO2023071768A1 (zh) 到站提醒方法、装置、终端、存储介质及程序产品
US11209275B2 (en) Motion detection method for transportation mode analysis
US9305317B2 (en) Systems and methods for collecting and transmitting telematics data from a mobile device
US20210394765A1 (en) Systems and methods for scoring driving trips
CN110660201B (zh) 到站提醒方法、装置、终端及存储介质
US10539586B2 (en) Techniques for determination of a motion state of a mobile device
WO2021169742A1 (zh) 交通工具运行状态的预测方法、装置、终端及存储介质
CN107886045B (zh) 设施满意度计算装置
CN111325386B (zh) 交通工具运行状态的预测方法、装置、终端及存储介质
CN107402397B (zh) 基于移动终端的用户活动状态确定方法、装置及移动终端
JPWO2018061491A1 (ja) 情報処理装置、情報処理方法、及びプログラム
CN110880328B (zh) 到站提醒方法、装置、终端及存储介质
CN110972112B (zh) 地铁运行方向的确定方法、装置、终端及存储介质
WO2021115232A1 (zh) 到站提醒方法、装置、终端及存储介质
US20170344123A1 (en) Recognition of Pickup and Glance Gestures on Mobile Devices
CN111402617B (zh) 站点信息确定方法、装置、终端及存储介质
EP3147831B1 (en) Information processing device and information processing method
JP6619316B2 (ja) 駐車位置探索方法、駐車位置探索装置、駐車位置探索プログラム及び移動体
CN113128115A (zh) 地铁运行状态的预测及模型的训练方法、装置及存储介质
EP3382570A1 (en) Method for characterizing driving events of a vehicle based on an accelerometer sensor
CN113810539B (zh) 到站提醒的方法、装置、终端及存储介质
JP7453889B2 (ja) 演算装置およびプログラム
EP2756658B1 (en) Detecting that a mobile device is riding with a vehicle
KR102260405B1 (ko) 인공신경망을 이용한 탑승객 인식 방법 및 그 장치
CN115631550A (zh) 一种用户反馈的方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885647

Country of ref document: EP

Kind code of ref document: A1