Some top 100,000 websites collect everything you type—before you hit submit

When you sign up for a newsletter, make a hotel reservation or check out online, you probably take it for granted that if you type your email address three times or change you and X off the page, it does not matter. Nothing actually happens until you press the Submit button, right? Well, maybe not. As with so many assumptions about the web, this is not always the case, according to new research: A surprising number of websites collect some or all of your data as you enter it in digital form.

Researchers from KU Leuven, Radboud University and the University of Lausanne searched and analyzed the 100,000 best websites and looked at scenarios where a user visits a website while in the EU and visits a website from the USA. They found that 1,844 websites collected an EU user’s email address without their consent, and an astonishing 2,950 logged a US user’s email in some form. Many of the sites do not seem to intend to carry out the data logging but contain third-party marketing and analysis services that cause the behavior.

After crawling websites for password leaks in May 2021, the researchers also found 52 websites where third parties, including the Russian technology giant Yandex, collected password data before submitting them. The group disclosed its findings to these sites, and all 52 cases have since been resolved.

“If there is a Submit button on a form, the reasonable expectation is that it does something – that it sends your data when you click on it,” says Güneş Acar, professor and researcher at Radboud University’s digital security team and one of the leaders of the study. . “We were super surprised by these results. We thought we might find a few hundred websites where your email is collected before submitting, but this far exceeded our expectations.”

The researchers, who will present their results at the Usenix Security Conference in August, say they were inspired to investigate what they call “leaking forms” of media reports, particularly from Gizmodo, about third parties who collect form data regardless of submission status. They point out that the behavior at its core is similar to so-called keyloggers, as it usually is malicious programs which logs everything that a target writes. But on a standard top-1,000 website, users will probably not expect to have their information keylogged. And in practice, the researchers saw some variations of the behavior. Some sites logged data keystrokes for keystrokes, but many retrieved complete entries from one field when users clicked on the next.

“In some cases, when you click on the next field, they collect the previous one, like you click on the password field and they collect the email, or you just click anywhere and they collect all the information immediately,” it says. Asuman Senol, a private person and identity researcher at KU Leuven and one of the study’s co-authors. “We did not expect to find thousands of sites; and in the United States, the numbers are really high, which is interesting.”

The researchers say that the regional differences may be related to companies being more careful with user tracking, and even potentially integrating with fewer third parties, due to the EU’s General Data Protection Regulation. However, they emphasize that this is only a possibility, and the study did not examine explanations for the difference.

Through a significant effort to notify websites and third parties that collect data in this way, the researchers found that an explanation for some of the unexpected data collection may have to do with the challenge of distinguishing a “send” action from other user actions on a particular web. pages. But the researchers emphasize that from an integrity perspective, this is not an adequate justification.

Since you quit paper, the group also had a discovery about Meta Pixel and TikTok Pixel, invisible marketing trackers that services embed on their websites to track users across the web and show them ads. Both claimed in their documentation that customers could enable “automatic advanced matching”, which would trigger data collection when a user submitted a form. But in practice, the researchers found that these tracking pixels captured hashed email addresses, a hidden version of email addresses used to identify web users on different platforms, before they were submitted. For US users, 8,438 sites may have leaked data to Meta, Facebook’s parent company, via pixels, and 7,379 sites may have been affected for EU users. For TikTok Pixel, the group found 154 sites for users in the US and 147 for users in the EU.

The researchers submitted a bug report to Meta on March 25 and the company quickly appointed an engineer for the case, but the group has not heard any update since. The researchers announced TikTok on April 21 – they discovered the TikTok behavior more recently – and have not heard from them. Meta and TikTok did not immediately return WIRED’s request for comment on the results.

“The privacy risks for users are that they will be tracked even more efficiently; they can be tracked on different websites, across different sessions, on mobile and desktops, says Acar. “An email address is such a useful identifier for tracking, because it’s global, it’s unique, it’s constant. You can not clear it as if you are clearing your cookies. It is a very powerful identifier.”

Acar also points out that as technology companies try to phase out cookie-based tracking in a nod to privacy issues, marketers and other analysts will increasingly rely on static IDs such as phone numbers and email addresses.

Because the results suggest that deleting data in a form before submitting it may not be enough to protect you from all collection, the researchers created a Firefox extension called LeakInspector to detect rogue form collections. And they say they hope their results will raise awareness of the issue, not just for regular web users but for website developers and administrators who can proactively control whether their own system or any of the third parties they use collect data from forms without consent.

Leaking forms are just another type of data collection to be wary of in an already extremely crowded online field.

This story originally appeared on wired.com.

Leave a Comment