Hi 👋 – I'm the founder of Castle, a developer-first fraud prevention platform that helps you automatically collect and analyze a user’s location based on location indicators–one of many features available.
In this post, I delve into various methods for predicting user location and their effectiveness while considering the importance of user privacy.
Determining a user's true location, even when behind a proxy (more on proxy piercing here), plays a crucial role in fraud detection and prevention. For example, you may need to identify the real location due to:
- Stolen identity to open a bank account
- Make discounts available to users from specific countries
- Restricting access from certain countries due to regulations
While there is no one-size-fits-all solution or "magic hack" to reveal a user's true location, especially not with accuracy better than country level, combining multiple data points can significantly increase the likelihood of identifying their location or, at the very least, casting doubt on the authenticity of their claimed whereabouts.
What methods can developers implement to gather and analyze information that can help more accurately determine a user's true location, or at the very least provide an indication that the user is likely not in the country they claim to be?
Methods to Help Predict a User's Location
1. IP Geolocation
IP geolocation services are the natural starting point for developers aiming to determine a user's location, providing a solid foundation for location prediction. These services are delivered via APIs or downloadable databases and typically return valuable data such as country, region, city, latitude, longitude, as well as the accuracy of the coordinates.
MaxMind's GeoLite2 is a popular free database to get started with, and then possibly graduate into their paid offerings known for their high accuracy in mapping IP addresses to countries, regions, and cities. There are also many other developer-friendly services, such as IPinfo and ipstack that are extremely easy to use. The difference in accuracy between these services is generally not noticeable in most cases.
That said, if pinpointing a user's location had been this easy, we wouldn't have felt compelled to write this article in the first place. The challenge arises in fraud scenarios, where bad actors often hide behind proxies, rendering the location information from IP geolocation useless. Several techniques beyond geolocation outlined below can help developers accurately determine a user's true location, even in the face of proxy use.
2. Timezone Settings
The first method we'll examine is quite simple, doesn't require user consent, and often serves as an initial indicator that the user is behind a proxy. We'll use JavaScript to fetch timezone data, which provides an approximate location.
Try pasting the following into your browser console:
> new Date().getTimezoneOffset() / -60
-7
> Intl.DateTimeFormat().resolvedOptions().timeZone
'America/Los_Angeles'
The first line logs the timezone UTC offset in hours, and the second line logs the IANA timezone identifier. The IANA timezone identifier is more informative, as it provides the exact timezone region, whereas the offset only offers the offset in hours from Coordinated Universal Time (UTC).
The trick is to compare the timezone of the country reported by IP geolocation with the obtained timezone offset values. It's wise to add plus/minus 60 as a buffer since the number can be inaccurate when close to country borders, and a more advanced implementation also takes daylight savings into account.
Similar data can be fetched on native iOS (TimeZone docs) and Android (TimeZone docs).
3. Language Settings
This method is also quite easy to implement, but it's harder to draw conclusions from it. We'll use HTTP headers and JavaScript to obtain the user's language settings.
The JS techniques are as follows:
> navigator.language || navigator.userLanguage
'es-US'
> navigator.languages
['es-US', 'es', 'sv']
You can also parse out the HTTP header `Accept-Language` on the server side which looks something like this: "es-US,es;q=0.9,sv;q=0.8".
Similar data can be retrieved on iOS (Locale docs) and Android (Locale docs).
So, retrieving these isn't all that hard, but actually making sense of them is. You'll need to use a mapping database such as https://github.com/hotosm/iso-countries-languages to map your languages to countries. The challenging part is that some languages, such as English or Spanish, are spoken in several countries, or alternatively it could just be a visitor in the country, so take the signal here with a grain of salt. Nevertheless, it's a valuable data point in the overall pinpointing exercise.
4. Phone Carrier
The phone carrier isn't available natively on web browsers, but it can be obtained on iOS (CoreTelephony docs) and Android (TelephonyManager docs) without user consent, and will give you the country code and carrier name.
If you have a mobile app, this method is one of the most effective ways of determining someone's actual location at the country level. The only caveat is that if the device is jailbroken, this information can be overridden by the bad actor, so make sure you have ways of detecting this scenario.
5. Address Geocoding
Extracting location information from user-provided addresses or points of interest is a solid datapoint since for many services like taxi hailing, food delivery, e-commerce, or financial services, the user is required to enter at least one street address and it needs to be accurate in order for them to retrieve their goods or have their identity documents matched.
Developer APIs such as Radar, OpenCage, and Google can do this for you, alternatively if you want to download an open source database to avoid the API latency you can use OpenStreetMap or Geonames which should be good enough for country and region resolution despite their offline nature.
6. Native Location Services
Leveraging device GPS and browser APIs for user location data on both web and mobile platforms is possibly the most accurate approach, but it requires user consent. So, unless your service already relies on location services, asking the user for this permission might seem suspicious. There's also a reliability issue when used in a browser context, where extensions like Location Guard can be easily employed to spoof the coordinates.
Run this JS example and you'll see how the browser prompts you with a popup before outputting the coordinates:
navigator.geolocation.getCurrentPosition(
(position) => {
console.log('Latitude:', position.coords.latitude);
console.log('Longitude:', position.coords.longitude);
},
(error) => {
console.error('Error retrieving location:', error);
}
);
On iOS, you'll use the CoreLocation for similar functionality, while on Android, there's LocationManager. Remember to first request the appropriate permissions in your app's manifest or Info.plist file and handle the user's consent.
7. Email Domain Format
While many popular email domains are not indicative of a specific country, such as gmail.com, others like university domains and top-level domains like .se for Sweden. Using a library such as https://github.com/zhijing-jin/email2country for Python can help you resolve the country and handle the gmail.com case:
>>> batch_email2institution_country(['nyu.edu','gmail.com', 'hku.hk'])
['United States', None, 'Hong Kong']
8. Phone Number Format
If your service collects users' phone numbers, you can take advantage of the fact that every country has a specific code assigned to it, and each code can be assigned to one or more countries. For example, a phone number beginning with the country code +1 is associated with the United States and Canada, and +46 is Sweden. Some countries, like the US, also use more local codes such as +1(415) for San Francisco, but it's far from accurate as people keep their numbers when they move to other cities or states.
To determine the country based on the country code using JavaScript, you can use Google's libphonenumber library, which is a free tool for parsing, formatting, and validating international phone numbers without making an API request for each lookup, and is in fact the backbone behind many commercial phone lookup APIs so you may want to compare data before paying for something you can get for free.
If you're on Node, install `google-libphonenumber` and run the following:
Create a file called phone_country.js and paste the following code:
const libphonenumber = require('google-libphonenumber');
const phoneUtil = libphonenumber.PhoneNumberUtil.getInstance();
const parsedNumber = phoneUtil.parse('+14152539225');
const countryCode = phoneUtil.getRegionCodeForNumber(parsedNumber);
console.log(`Country code:`, countryCode);
// US
9. Phone Number Lookup
Developer APIs such as Twilio and Telnyx offer endpoints that enable you to look up data related to a given phone number, including the carrier's Mobile Country Code (MCC) and Mobile Network Code (MNC). Some APIs expose the actual country code and carrier name directly, but if they don't, you can use a lookup table to translate MCC and MNC, such as https://github.com/musalbas/mcc-mnc-table.
Be aware that these lookup APIs can be more on the expensive side, but the powerful ones also return data such as porting history, roaming status, and whether the number is disposable (VoIP), so it might be worth the investment.
10. Social Media Lookup
Although social networks are increasingly shutting down access to scraping profile details based on email and phone numbers, if you get a hit, you might be able to see in which cities someone worked or locations reported on their profiles. Some APIs more geared towards developers include PeopleDataLabs and Seon.
This information can provide additional data points to help pinpoint a user's location, but keep in mind that the accuracy of this method can vary and may also raise privacy concerns.
11. Keyboard Layout
This final method is more of an experimental feature, but can be used as an additional check when analyzing all the location indicators. As covered by ghacks.net, by exploiting the Keyboard Map API introduced in the recent Google Chrome 97, websites can access information about a user's keyboard layout. For example, users in France typically use the AZERTY layout, while those in the United States use QWERTY. By analyzing the keyboard layout, you may be able to infer the user's location, although this method is far from foolproof.
Similar techniques are available on mobile devices, such as retrieving all configured keyboard layouts, including emoji. However, it shares the same limitations as the web version.
Many browser developers have voiced concerns about the privacy implications of this feature, and as a result, some browsers will not implement or will disable the API.
Predicting the True Location: Putting It All Together
To sum up, there is no sure-fire way to pinpoint a user's exact location without their consent. Instead, we must rely on a set of less granular data points. While these individual points might not provide clear insights on their own, combining them can reveal a clearer picture, provided that enough location indicators have been collected.
Start by normalizing all data points into standard ISO country codes:
const countryCodes = {
ipGeolocation: 'US',
timezone: ['SE'],
languages: ['US', 'SE'],
phoneCarrier: 'SE',
addressGeocoding: ['US', 'SE'],
nativeLocation: null,
emailDomain: null,
phoneNumberFormat: 'US',
phoneNumberLookup: 'US',
socialMediaLookup: null,
keyboardLayout: 'US'
};
In this example, the user seems to have a US-based IP address and has provided a US phone number. However, one of the provided street addresses is located in Sweden, which is also suggested by the timezone and language settings.
Predicting the exact location from this information is virtually impossible, but instead, focus on generating two variables: predictedCountryCode and locationConsistency, which you can then use in your rules engine logic.
One implementation of predictedCountryCode could involve selecting the dominant country code, while locationConsistency could be the percentage that this country code makes up of the total.
These are simplistic methods, but they serve as a general direction. To enhance this approach, consider adding weights to each type of location data based on its reliability and maintaining individual weights for each country based on its uniqueness. The more unique a country is, the higher the likelihood that it is the user's true location and not just someone with English language settings.
Also, ensure that you collect as much data across the user's session as possible, which will offer a greater likelihood of collecting multiple locations and device settings for the same user over time.
Sign up for a 30-day risk-free trial and let me know what you think!