A mysterious marketing database containing the personal details of an estimated 35 million people was exposed on the web without a password, Comparitech researchers report. The database included names, contact information, home addresses, ethnicities, and a wealth of demographic information ranging from hobbies and interests to shopping habits and media consumption.
The sample of files viewed by Comparitech researchers indicated a majority of the records pertained to residents of Chicago, Los Angeles, and San Diego, and their surrounding areas.
The database could be accessed in full by anyone with a web browser and an internet connection. The information in the database could be used for targeted spam and scam campaigns and phishing. It also threatens the privacy of people who do not want their personal details, including address and/or contact information, publicized.
Timeline of the exposure
Bob Diachenko, head of Compariech’s cybersecurity research team, discovered the database on June 26, 2021. We do not know how long it was exposed prior.
After expending all means at our disposal, we were unable to identify the database’s owner. Diachenko resorted to contacting Amazon Web Services, which hosted the database’s server, to request it be taken down.
The data was accessible until July 27, 2021.
In total, the information was exposed for at least a month. Our honeypot experiments show cybercriminals can find and access unsecured databases like this one in a matter of hours.
What data was exposed?
The Elasticsearch database was hosted on Amazon Web Services and accessible through a public-facing Kibana interface that required no authentication for access. It contained more than 35 million records in total. Each of those records contained all or some of the following information:
- Full name
- Home address
- Birth date
- Phone number
- Email address
- Ethnicity
- Gender
- Marital status
- Occupation
- Categorical demographic data. These are indicators of the data subject’s:
- interests (automobiles, wine, knitting, etc)
- media consumption (PC gamer, satellite TV subscriber, audiobook listener, etc)
- estimated income
- estimated net worth
- pet ownership
- property information (estimated home value, purchase date, has pool, etc)
- lifestyle (athletic, well off, high tech, etc)
- purchasing habits (credit card tier, buys jewelry, number of credit lines, etc)
- affiliations (types of charities, political party, etc)\
Each person’s record contains 268 fields of information, so we won’t go through them all here.
Most of the data subjects appear to be residents of Illinois and California, though there are a few linked to surrounding states. Comparitech contacted a small number of data subjects using the exposed names and phone numbers to verify the information in the database was genuine.
Each record in the database also contains an eight or nine-digit ID number. At first glance, some of these appear to be Social Security Numbers, but after further investigation we no longer believe that to be the case. Nonetheless, we still urge DuPage county residents to err on the side of caution and report any incidents of attempted identity theft to the FTC.
No financial information or passwords were in the database.
Where did the data come from?
We do not know.
We have not been able to uncover any evidence that points to whom the data belongs. The organizations we approached as likely owners denied the data belonged to them. Our only clue is that the time zone of the hosting server is set to Kolkata, India.
Timestamps in the database indicate the information started being gathered as early as 2010. Existing information was updated and new information added as recently as May 2021.
The data was most likely intended for marketing purposes.
A significant portion of the records include a field called “source domain” that might hint at the information’s origin. The field often contained website domains where the data could have been originally harvested. The websites were often dodgy if not outright scams: rent-to-own homes, cruise giveaways, money advances, cash sweepstakes, etc. So it seems plausible that this is a spam or scam marketing database.
But as to the identity of the person or organization who aggregated all of the data and ultimately exposed it on the web, we don’t know.
Dangers of exposed information
The combination of demographic data along with contact information is a gold mine for spammers and scammers. They can use the information to contact victims with personalized emails, texts, and calls. Chicago, Los Angeles, and San Diego residents should be on the lookout for scams and phishing schemes.
Never click a link in an unsolicited email and always verify the sender’s identity before providing any personal or financial information.
The information also threatens the privacy of people who don’t want their names, contact information, and addresses publicized: domestic abuse victims, undocumented immigrants, judges, lawyers, and former criminals, to name a few.
Why we reported this data incident
Comparitech’s cybersecurity research team routinely scans the internet for unprotected databases containing personal information. When we find an exposed database, we immediately begin investigating who is responsible for it, who might be impacted, what data is exposed, and the potential impact on end users.
After identifying whomever is responsible for the data, we immediately alert them in accordance with our responsible disclosure policy. As soon as the data is secured and our investigation is complete, we publish an article like this one to raise awareness and curb harm to end users. In this case, after failing to identify the owner, we alerted hosting provider Amazon Web Services, which contacted the owner on our behalf.
Previous data incident reports
Comparitech has found and reported on several data incidents like this one, including:
- Cybersecurity company exposes 5 billion records from previous data breaches
- British Gas software vendor exposes 3.6 million customer email addresses
- India visa agency exposes 6,500 traveler’s visa applications on the web
- Utah COVID-19 testing service exposes 50,000 patients’ photo IDs, personal info
- Car dealer marketing service Friendemic exposes 2.7 million consumer records
- Gym chain Town Sports exposes 600,000 records of members and staff
- Prison phone service Telmate exposes messages, personal info of millions of inmates
- Social media data broker exposes nearly 235 million scraped profiles
- UFO VPN exposes millions of logs including user passwords
- 42 million Iranian “Telegram” phone numbers and user IDs were breached
- Details of nearly 8 million UK online purchases leaked
- 250 million Microsoft customer support records were exposed online
- More than 260 million Facebook credentials were posted to a hacker forum
- Almost 3 billion email address leaked, many with corresponding passwords
- Detailed information on 188 million people was held in an unsecured database
- Over 2.5 million CenturyLink customer records leaked