Background
In recent years, with the rapid development of communication technology and computer technology, an Intelligent Transportation System (ITS) is favored by many researchers, and Intelligent taxi technology is an important component of an Intelligent Transportation System and is receiving wide attention of researchers. Meanwhile, the demand of the passenger on the taxi is not only on waiting for the taxi, and more intelligent matching services are proposed successively, such as: taxi reservation service, carpooling service, optimal driving route service and the like, which are provided without analyzing the historical track data of the taxi. Since a large amount of taxi history track data is generated every day, rapid data analysis is urgent to meet the real-time performance of each service. Although the big data platform is an important means for increasing analysis efficiency, the analysis strategy of taxi historical track data is more critical, and a plurality of researchers adopt an offline historical track processing mode and an online service recommendation mode to meet the real-time performance of recommendation.
The first is to recommend the best passenger searching position for the passenger from the perspective of the passenger, so as to improve the passenger searching efficiency and provide the expected arrival time and the arrival probability of the taxi. And secondly, recommending an optimal passenger searching route for the driver from the perspective of the driver, analyzing a passenger searching hot spot area near the taxi, analyzing expected passenger searching time and passenger searching probability of the taxi going to different hot spots, and screening and recommending the passenger searching hot spots according to the two factors in order to save the passenger searching cost of the driver. Since taxi historical data generated every day are different, selection of different taxi historical data has a great influence on accuracy of recommended service, and currently, taxi historical data are divided into working day data and non-working day data in a relatively large manner in the field, but the division method is not fine enough. There are many factors affecting people's traveling, such as weather, holidays, time, etc., and in order to reasonably select historical data to realize recommendation service, the factors should be considered comprehensively, and reasonable historical data should be dynamically selected in real time, rather than considering one factor singly.
In the process of searching for passengers in a taxi, drivers with rich driving experiences can effectively find passenger carrying hot spot areas such as railway stations and movie theaters, and the number of the passengers in the areas changes along with time, so that the passenger carrying hot spot areas cannot be recommended to the drivers singly. And the change of the number of passengers in the passenger carrying hot spot and a passenger searching scheme which is not obvious to a taxi driver are hidden in the historical track data of the taxi. Therefore, the TCSFP passenger searching strategy is provided by integrating the offline processing process of the taxi historical track data and the online processing process of the real-time traffic information.
Disclosure of Invention
The invention aims to provide an intelligent Taxi Passenger searching method, which aims to solve the problem that a Taxi and a Passenger in a city are difficult to match and provides a Taxi Passenger searching Strategy (TCSFP) Based on Passenger capacity prediction by combining historical track data and real-time traffic information of the Taxi.
The invention is realized by the following technical scheme: an intelligent taxi passenger searching method comprises the following steps:
the method comprises the following steps: and (3) passenger capacity prediction: predicting passenger capacity of a passenger carrying hot spot area in a city based on historical taxi track data, screening out dates similar to the current passenger capacity in the historical data according to a prediction result, and generating a space-time Index for the screened dates;
step two: constructing a visitor finding index database: establishing a passenger searching efficiency database and a driving time database between hot spots of a passenger carrying hot spot area based on historical track data of the taxi, wherein the passenger searching efficiency database comprises passenger searching time, passenger carrying probability and passenger carrying income of the hot spots;
step three: passenger carrying hotspot screening: screening a passenger searching efficiency database and a driving time database between hot spots on corresponding dates from the second step according to the space-time Index generated in the first step, balancing passenger searching efficiency of taxies going to different hot spots, and screening out an optimal hot spot area, wherein the screening principle is shown as a formula (1):
wherein i represents the number of the area where the taxi is currently located, j represents the number of the jth hot spot area, m represents the mth time period of each day,
for the taxi to travel to the jth hotspot,
time, p, required for taxi to search for passenger at jth hot spot
jmThe probability of carrying a passenger for the jth hotspot, r
jmThe average passenger revenue for the jth hotspot.
Further, the step one comprises the following steps:
the method comprises the following steps: extracting all passenger boarding points from the historical data, and counting the number of passenger loads occurring in different time periods, as shown in formula (2):
wherein h is
iDenotes the ith hotspot zone, d
jIndicating the jth day in the taxi history, L indicating that the day is divided into L time segments,
representing the number of passengers present during the kth time period;
the first step is: collecting hotspotsiAll passenger load data in the historical data, as shown in equation (3):
M(hi)={P(hi,d1),P(hi,d2),...,P(hi,dn)} (3)
assuming that the load of the mth time slot of the ith hot spot is predicted and the load of the first k time slots is used as the prediction feature, the machine learning feature and the label as shown in formula (4) and formula (5) can be constructed,
step one is three: the machine learning classification model is trained on the features and labels provided by equations (4) and (5). And finding out the passenger capacity close to the predicted result from the formula (5)
And selecting the most reasonable passenger searching index of the historical data analysis hotspot of the mth time period on the jth day.
Further, in the second step, specifically, the
p
jk,r
jkQuantization is performed, and the quantization result is shown in formula (6):
wherein
And
respectively are the passenger searching time, the passenger searching probability and the quantitative result of the passenger carrying income,
and
are all between 0 and k-1,
use of
Quantized results instead of seek time
Then, formula (7):
changing selection of targets to find
And
all with a large hot spot.
Further, the third step includes the following steps:
step three, firstly: setting weight for each hotspot, wherein the weight of the hotspot is the sum of passenger searching time, passenger searching probability and passenger searching income of the passenger carrying hotspot:
step three: sorting the weights of the passenger-carrying hotspots from big to small, and recording the sorting result as:
S(hi,m)=(weight1,weight2,...,weightl)
(9)
and screening out the edge hot spots, and recommending the screened edge hot spots to a car rental driver.
The invention has the beneficial effects that: the invention starts from the perspective of recommending the best Passenger searching Strategy for the Taxi running empty, and provides a Passenger searching Strategy (Taxi cruise route Based on Passenger searching Passenger Volume, TCSFP) Based on Passenger capacity prediction by combining the historical track data of the Taxi and the online Passenger capacity information, wherein the Strategy is divided into two stages: a passenger capacity prediction stage and a taxi passenger carrying hot spot screening stage. And a space-time Index can be generated in the passenger capacity prediction stage, historical track data can be dynamically selected, and a passenger searching hotspot can be recommended for the taxi in the passenger carrying hotspot recommendation stage. Meanwhile, the strategy comprises two processing processes of online processing and offline processing, so that the calculation time of the recommendation service can be greatly shortened.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the invention is realized by the following technical scheme: an intelligent taxi passenger searching method comprises the following steps:
the method comprises the following steps: and (3) passenger capacity prediction: predicting passenger capacity of a passenger carrying hot spot area in a city based on historical taxi track data, screening out dates similar to the current passenger capacity in the historical data according to a prediction result, and generating a space-time Index for the screened dates;
step two: constructing a visitor finding index database: establishing a passenger searching efficiency database and a driving time database between hot spots of a passenger carrying hot spot area based on historical track data of the taxi, wherein the passenger searching efficiency database comprises passenger searching time, passenger carrying probability and passenger carrying income of the hot spots;
step three: passenger carrying hotspot screening: screening a passenger searching efficiency database and a driving time database between hot spots on corresponding dates from the second step according to the space-time Index generated in the first step, balancing passenger searching efficiency of taxies going to different hot spots, and screening out an optimal hot spot area, wherein the screening principle is shown as a formula (1):
wherein i represents the number of the area where the taxi is currently located, j represents the number of the jth hot spot area, m represents the mth time period of each day,
for the taxi to travel to the jth hotspot,
time, p, required for taxi to search for passenger at jth hot spot
jmThe probability of carrying a passenger for the jth hotspot, r
jmThe average passenger revenue for the jth hotspot.
Specifically, in the first step, the processing result of dynamically selecting historical data from the database is realized through the prediction of passenger capacity; in the second step, the database is built, the passenger searching efficiency and the driving time of each hotspot are provided, and the recommendation efficiency of the third step is accelerated. The invention integrates the offline processing process of the taxi historical track data and the online processing process of the real-time traffic information, and provides a TCSFP passenger searching strategy, as shown in figure 1. The TCSFP passenger searching strategy is divided into an online processing process and an offline processing process, wherein the processing process above the dotted line in the figure 1 is the offline processing process of the taxi historical data, and the processing process below the dotted line is the online recommending process. The historical data processing is divided into two aspects, wherein a left-side dotted line frame represents a passenger capacity prediction stage, and a right-side dotted line frame represents a passenger hot spot recommendation stage. The taxi passenger searching time, the passenger carrying probability and the passenger carrying income are very key factors for screening the passenger carrying hot spots, the influence on the passenger searching performance is very large, and the factors are changed in real time, so that the factors need to be rapidly obtained and comprehensively evaluated.
Referring to fig. 1, in the present preferred embodiment, the first step includes the following steps:
the method comprises the following steps: extracting all passenger boarding points from the historical data, and counting the number of passenger loads occurring in different time periods, as shown in formula (2):
wherein h is
iDenotes the ith hotspot zone, d
jIndicating the jth day in the taxi history, L indicating that the day is divided into L time segments,
representing the number of passengers present during the kth time period;
the first step is: collecting hotspotsiAll passenger load data in the historical data, as shown in equation (3):
M(hi)={P(hi,d1),P(hi,d2),...,P(hi,dn)} (3)
assuming that the load of the mth time slot of the ith hot spot is predicted and the load of the first k time slots is used as the prediction feature, the machine learning feature and the label as shown in formula (4) and formula (5) can be constructed,
step one is three: the machine learning classification model is trained on the features and labels provided by equations (4) and (5). And finding out the passenger capacity close to the predicted result from the formula (5)
The load capacity of the mth time period of the history data of the jth day is closest to the prediction result, so that the searching of the historical data analysis hot spot of the mth time period of the jth day is selectedThe guest index is most reasonable.
Specifically, as the travel of people is influenced by a plurality of factors, such as weather, holidays and the like, the selection of proper historical data for analysis is very critical in the process of finding the best passenger searching hot spot for the taxi. And (4) screening a date similar to the predicted passenger capacity result from the historical data, reasonably showing that the travel of the passengers on the date is similar to that of the current day, generating a space-time Index according to the screened date, and inputting the space-time Index into the step three to serve as a basis for selecting the historical data in the step three.
Referring to fig. 1, in the present preferred embodiment, in step two, specifically, the taxi is started from the starting position h
iGo to the hotspot h for carrying passengers
jThe passenger searching time, the passenger searching probability and the passenger carrying income are respectively different types of indexes, so the passenger searching time, the passenger searching probability and the passenger carrying income are required to be adjusted
p
jk,r
jkQuantization is performed, and the quantization result is shown in formula (6):
wherein
And
respectively the passenger searching time, the passenger searching probability and the passenger carrying income, and the values are between 0 and k-1,
the passenger searching time is as small as possible, and the passenger searching probability and the passenger carrying income are as high as possible, so the passenger searching method is used
The quantization result instead of the time for searching the passenger can still be ensured
Between 0 and k-1 and the selection target is changed to find
And
the hot spots are all large, and the formula (7) is shown:
as shown in FIG. 2, the "edge hotspots" far from the three coordinate axes should be selected as much as possible, and the "interior hotspots" near the far points should be deleted, so that the hotspots h are deleted2。
Specifically, because the recommendation of the taxi passenger searching scheme needs to meet the real-time performance, the processing of the historical track data is finished on line. The passenger carrying probability and the passenger carrying income of the taxi can directly correspond to the passenger carrying probability and the passenger carrying income of each hotspot, and can be relatively easily extracted from historical data, and the passenger searching time of the taxi consists of the time required by a driver to go to different hotspots and the average passenger searching time of each hotspot, so that the attributes of the hotspots are divided into: the hot spot passenger searching time, the passenger carrying probability and the passenger carrying income. The time required by the taxi to go to the hot spot can be obtained from a driving time database between the hot spots, and the passenger searching time of the taxi is the sum of the time required by the taxi to go to the hot spot and the average passenger searching time of the hot spot. This step belongs to the off-line treatment process.
Referring to fig. 1, in the preferred embodiment of this section, the following steps are included in the third step:
step three, firstly: setting weight for each hotspot, wherein the weight of the hotspot is the sum of passenger searching time, passenger searching probability and passenger searching income of the passenger carrying hotspot:
step three: sorting the weights of the passenger-carrying hotspots from big to small, and recording the sorting result as:
S(hi,m)=(weight1,weight2,...,weightl) (9)
step three: for the sorted result S (h)iAnd m) judging whether each element in m) is an edge hotspot:
1. due to weight1For the maximum weight, it can be determined that the hotspot to which the hotspot belongs is an edge hotspot. The following was demonstrated:
if the hot spot haHas the greatest weight, and haBelonging to an "internal" hotspot, then there must be a hotspot hbThe passenger searching time, the passenger searching probability and the passenger carrying income are all larger than the hotspot haThus hot spot hbWeight (h) of (c)i,hbK) is greater than haWeight (h) of (c)i,haK), this is in combination with haWith the greatest weight contradiction, haThe result is an edge hot spot after the verification.
2. If weight2The visitor searching time, the visitor searching probability and the visitor carrying income of the hotspot are all less than weight1At the hot spot, then weight2The hot spot is an internal hot spot; otherwise, weight2The hot spots are edge hot spots, which are proved as follows:
if weight2The hot spot is an internal hot spot, and then a hot spot h existsbThe passenger searching time, the passenger searching probability and the passenger carrying income are all larger than weight2At a hot spot, therefore hbIs greater than weight2. And is greater than weight2Has weight only1This contradicts the hypothesis, and is confirmed.
By analogy to S (h)iAnd k) comparing each element in the k) with the selected edge hotspots, judging one by one, screening the edge hotspots, and recommending the screened edge hotspots to the car rental driver.
Specifically, as can be seen from fig. 1, the taxi passenger searching request only goes through one step, and other processing procedures are all completed on line or in advance, so that the time spent by the taxi passenger searching request can be greatly shortened.