It is natural to relate the number of check-ins or the number of visitors at a venue with the popularity of a venue. Thus, we use the following measurements to investigate the popularity of a venue:
a) cumulative number of check-ins at venue i (CCi)
b) cumulative number of unique users of venue i (CUi)
We can get CCi and CUi from the venue information in our dataset and we calculate U CFi as: U CFi = CCi CUi (3.1)
We plot the empirical cumulative distribution function (CDF) of these three measures2 in Figure3.1and we also list the basic statistical features of CC, CU and U CF in Table 3.2. Since the maximum cumulative number of check-ins and maximum cumulative number of users at a venue are large, e.g., 82,590 and 27,944, respectively, we use log scale in x axis.
In Figure 3.1, we can see that the CC of about 85% of the venues is no more than 100 and the CU of about 96% of the venues is no more than 100. Moreover, about 18% of the venues have only one check-in and about 37% of the venues have only one user who checked in there. From Table 3.2, we can see that the CC of 50% of the venues is no more than 10 and the CU of 50% of the venues is no more than 2, which indicates that the CC and CU in Foursquare follows the long tail behavior [65]. We are using the cumulative data to analyze the venue popularity as we aim to investigate the overall venue popularity but not limited to the venue popularity during a certain time period. In [21], Noulas et al. analyze the check-ins collected during their data collection period and the complementary cumulative distribution function (CCDF) of the number of check-ins also exhibit heavy tail and power-law behavior. We also study the check-ins for venues in Section 5.4 and show the CCDF of check-ins in Figure5.5; our results also show that a few venues obtain a large number of check-ins while a larger number of venues have only few check-ins. Moreover, we will focus on trending venues to investigate the venue popularity during a certain period time in Section 3.3.
Generally, the larger U CF indicates the more check-ins from a user on average. In Table 3.2, the max U CF is 815, which is related to a resident address and there is only one person checked in this venue 815 times. In Figure3.1, the U CF of about 83% of the venues is no more than 10, which means the average check-ins per user at these venues is at most 10. 50% venues have an average check-ins per person of less than 3.
The U CF also implies the users’ loyalty to a venue, especially for venues in Food, Shop & Service and Nightlife Spots categories. Thus, U CF of such venues can not only help the business owners to understand their customers, but also help customers to compare
Table 3.2: Statistics of the CC, CU and U CF Min of CC 0 Max of CC 82,590 Mean of CC 94.70 Median of CC 10 Standard Deviation of CC 627.56 Min of CU 0 Max of CU 27,944 Mean of CU 26.87 Median of CU 2 Standard Deviation of CU 218.19 Min of U CF 1 Max of U CF 815 Mean of U CF 8.68 Median of U CF 2.5 Standard Deviation of U CF 21.99
Table 3.3: Top 10 popular venues based on cumulative number of check-ins and cumulative number of users
AC AU
1 Pittsburgh International Airport Pittsburgh International Airport 2 CONSOL Energy Center PNC Park
3 PNC Park CONSOL Energy Center
4 Ross Park Mall Heinz Field
5 Rivers Casino Rivers Casino
6 Heinz Field Ross Park Mall
7 Robinson Mall Hofbr¨auhaus Pittsburgh 8 Giant Eagle Market District Robinson Mall
9 Lowes Theatre IKEA
10 Cathedral of Learning Lowes Theatre
similar merchants which offer similar types of services. For example, if we define U CF as the customer loyalty then the average customer loyalty in Food category in Foursquare is about 3 and the customer loyalty of Tom’s bakery house is 5. Based on this, Tom can see that there are many returning customers in his venue and he should offer specials to these returning customers as rewards for their loyalty. Meanwhile, if a customer wants to find a good nearby bakery house, he can compare the total number of customers and the user check-in frequency of the venue with that of the corresponding average values. If both of the parameters of Tom’s bakery house are higher than that of the similar other venues, then it indicates that Tom’s bakery house is better than the average. However, the assumption here is that all check-ins are honest.
Since the most popular venues attract most of the users, we list the name of the top 10 popular venues with the largest CC and CU in Table 3.3. The top 10 popular venues with the largest U CF are all Residence venues, but we do not list them because of the privacy concerns.
0 0.5 1 1.5 2 2.5 3x 10 4
Arts & EntertainmentCollege & University Food
Professional & Others Nightlife Spot
Great OutdoorsShop & Service Travel & Transport
Residence
Figure 3.2: Venue distribution in 9 top categories