In December 2021, Robin Kok wrote a series of tweets about his Elsevier data access request. I did the same a few days later. This here is the resulting collaborative blog post, summarizing our journey in trying to understand what data Elsevier collects; what data Elsevier has collected on us two specifically; and trying to get this data deleted. A PDF version of this blog post is also available.
Elsevier, data kraken
Everybody in academia knows Elsevier. Even if you think you don’t, you probably do. Not only do they publish over 2,500 scientific journals, but they also own the citation database Scopus, as well as the ScienceDirect collection of electronic journals from which you get your papers. That nifty PURE system your university wants you to use to keep track of your publications and projects? You guessed it: Elsevier. And what about that marvelous reference manager, Mendeley? Elsevier bought it in 2013. The list goes on and on.
But what exactly is Elsevier? We follow the advice of an Elsevier spokesperson: “if you think that information should be free of charge, go to Wikipedia”. Let’s do that! Wikipedia, in their core summary section, introduces Elsevier as “a Netherlands-based academic publishing company specializing in scientific, technical, and medical content.”
The intro continues:
And it’s not just rent-seeking. Elsevier admitted to writing “sponsored article compilation publications, on behalf of pharmaceutical clients, that were made to look like journals and lacked the proper disclosures“; offered Amazon vouchers to a select group of researchers to submit five star reviews on Amazon for certain products; manipulated citation reports; and is one of the leading lobbyists against open access and open science efforts. For this, Elsevier’s parent company, RELX, even employs two full-time lobbyists in the European Parliament, feeding “advice” into the highest levels of legislation and science organization. Here is a good summary of Elsevier’s problematic practices—suffice it to say that they’re very good at making profits.
As described by Wikipedia, one way to make profits is Elsevier’s business as an academic publisher. Academics write articles for Elsevier journals for free and hand over copyright; other academics review and edit these papers for free; and Elsevier then sells these papers back to academics. Much of the labor that goes into Elsevier products is funded by public money, only for Elsevier to sell the finished products back e.g. to university libraries, using up even more public money.
But in the 2020s—and now we come to the main topic of this piece—there is a second way of making money: selling data. Elsevier’s parent company RELX bills itself as “a global provider of information-based analytics and decision tools for professional and business customers”. And Elsevier itself has been busy with rebranding, too:
This may sound irrelevant to you as a researcher, but here we show how Elsevier helps them to monetize your data; the amount of data they have on you; and why it will require major steps to change this troubling situation.
Data access request
Luckily, folks over at Elsevier “take your privacy and trust in [them] very seriously”, so we used the Elsevier Privacy Support Hub to start an “access to personal information” request. Being in the EU, we are legally entitled under the European General Data Protection Regulation (GDPR) to ask Elsevier what data they have on us, and submitting this request was easy and quick.
After a few weeks, we both received responses by email. We had been assigned numbers 0000034 and 0000272 respectively, perhaps implying that relatively few people have made use of this system yet. The emails contained several files with a wide range of our data, in different formats. One of the attached excel files had over 700,000 cells of data, going back many years, exceeding 5mb in file size. We want to talk you through a few examples of what Elsevier knows about us.
They have your data
To start with, of course they have information we have provided them with in our interactions with Elsevier journals: full names, academic affiliations, university e-mail addresses, completed reviews and corresponding journals, times when we declined review requests, and so on.
Apart from this, there was a list of IP addresses. Checking these IP addresses identified one of us in the small city we live in, rather than where our university is located. We also found several personal user IDs, which is likely how Elsevier connects our data across platforms and accounts. We were also surprised to see multiple (correct) private mobile phone numbers and e-mail addresses included.
And there is more. Elsevier tracks which emails you open, the number of links per email clicked, and so on.
We also found our personal address and bank account details, probably because we had received a small payment for serving as a statistical reviewer1. These €55 sure came with a privacy cost larger than anticipated.
Data called “Web Traffic via Adobe Analytics” appears to list which websites we visited, when, and from which IP address. “ScienceDirect Usage Data” contains information on when we looked at which papers, and what we did on the corresponding website. Elsevier appears to distinguish between downloading or looking at the full paper and other types of access, such as looking at a particular image (e.g. “ArticleURLrequestPage”, “MiamiImageURLrequestPage”, and “MiamiImageURLreadPDF”), although it’s not entirely clear from the data export. This leads to a general issue that will come up more often in this piece: while Elsevier shared what data they have on us, and while they know what the data mean, it was often unclear for us navigating the data export what the data mean. In that sense, the usefulness of the current data export is, at least in part, questionable. In the extreme, it’s a bit like asking google what they know about you and they send you a file full of special characters that have no meaning to you.
Going back to what data they have, next up: Mendeley. Like many, both of us have used this reference manager for years. For one of us, the corresponding tab in the excel file from Elsevier contained a whopping 213,000 lines of data, from 2016 to 2022. For the other, although he also used Mendeley extensively for years, the data export contained no information on Mendeley data whatsoever, a discrepancy for which we could not find an explanation. Elsevier appears to log every time you open Mendeley, and many other things you do with the software—we found field codes such as “OpenPdfIn InternalViewer”, “UserDocument Created”, “DocumentAnnotation Created”, “UserDocument Updated”, “FileDownloaded”, and so on.
They use your data
Although many of these data points seem relatively innocent at first, they can easily be monetized, because you can extrapolate core working hours, vacation times, and other patterns of a person’s life. This can be understood as detailed information about the workflow of academics – exactly the thing we would want to know if, like Elsevier, our goal was to be a pervasive element in the entire academic lifecycle.
This interest in academic lifecycle data is not surprising, given the role of Elsevier’s parent company RELX as a global provider of information-based analytics and decision tools, as well as Elsevier’s rebranding towards an Information Analytics Business. Collecting data comes at a cost for a company, and it is safe to assume that they wouldn’t gather data if they didn’t intend to do something with it.
One of the ways to monetize your data is painfully obvious: oldschool spam email tactics such as trying to get you to use more Elsevier services by signing you up for newsletters. Many academics receive unending floods of unsolicited emails and newsletters by Elsevier, which prompted one of us to do the subject access request in the first place. In the data export, we found a huge list of highly irrelevant newsletters we were unknowingly subscribed to—for one of us, the corresponding part of the data on “communications” has over 5000 rows.
You agreed to all of this?
Well, actually, now that you ask, we don’t quite recall consenting to Mendeley collecting data that could be used to infer information on our working hours and vacation time. After all, with this kind of data, it is entirely possible that Elsevier knows our work schedule better than our employers. And what about the unsolicited emails that we received even after unsubscribing? For most of these, it’s implausible that we would have consented. As you can see in the screenshot above, during one day (sorry, night!), at 3:20am, within a single minute, one of us “signed up” to no fewer than 50 newsletters at the same time – nearly all unrelated to our academic discipline.
Does Elsevier really have our consent for these and other types of data they collected? The data export seems to answers this question, too, with aptly named columns such as “no consent” and “unknown consent”, the 0s and 1s probably marking “yes” or “no”.
You can check-out any time you like…?
Elsevier knows a lot about us, and the data they sent us in response to our access request may only scratch the surface. Although they sent a large volume of data, inconsistencies we found (like missing Mendeley data from one of us) make us doubt whether it is truly all the data they have. What to do? The answer seems straightforward: we can just stop donating our unpaid time and our personal and professional data, right? Indeed, more than 20,000 researchers have already taken a stand against Elsevier’s business practices, by openly refusing to publish in (or review / do editorial work for) Elsevier.
But that does not really solve the problem we’re dealing with here. A lot of your data Elsevier might monetize is data you cannot really avoid to provide as an academic. For example, many of you will access full texts of papers through the ScienceDirect website, which often requires an institutional login. Given that the login is uniquely identifiable, they know exactly which papers you’ve looked at, and when. This also pertains to all of the other Elsevier products, some of which we briefly mentioned above, as well as emails. Many emails may be crucial for you (e.g. from an important journal), and Elsevier logs what emails you open and whether you click on links. Sure, this is probably standard marketing practice and Elsevier is not the only company doing it, but it doesn’t change the fact that as an active academic, you basically cannot avoid giving them data they can sell. In fact, just nominating someone for peer review can be enough to get them on their list. Did you ever realize that for most reviews you’re invited to, you actually never consented to being approached by the given journal?
Elsevier has created a system where it seems impossible to avoid giving them your data. Dominating or at least co-dominating the market of academic publishing, they exploited free labor of researchers, and charged universities very high amounts of money so researchers could access scientific papers (which, in part, they wrote, reviewed and edited themselves). This pseudo-monopoly made Elsevier non-substitutable, which now allows their transition into a company selling your data.
Worse, they say that “personal information that is integral to editorial history will be retained for as long as the articles are being made available”, as they write in their supporting information document on data collection and processing we received as part of the access request. What data exactly are integral to editorial history remains unclear.
If not interacting with Elsevier is not a sustainable solution in the current infrastructure, maybe some more drastic measures are required. So one of us took the most drastic step available on Elsevier’s privacy hub: a deletion of personal information request.
This was also promptly handled, but leaves two core concerns. First, it is not entirely clear to us what information was retained by Elsevier, for example, because they consider it “integral to editorial history”. And second, how sustainable is data deletion if all it takes to be sucked back into the Elsevier data ecosystem again is one of your colleagues recommending you as a reviewer for one of the 600,000 articles Elsevier publishes per year?
Some of the issues mentioned here, such as lack of consent, seem problematic to us from the perspective of e.g. European data protection laws. Is it ok for companies to sign us up to newsletters without consent? Is it ok to collect and retain personal data indefinitely because Elsevier argues it is necessary?
And when Elsevier writes in the supporting information that they do “not undertake any automated decision making in relation to your personal information” (which may violate European laws), can that be true when they write, in the same document, that they are using personal information to tailoring experiences? “We are using your personal data for […] enhancing your experience of those products, for example by providing personalized recommendations based on your use of the products.”
We are not legal scholars, and maybe there is no fire here. But from where we stand, there seems to be an awful lot of smoke. We hope that legal and privacy experts can bring clarity to the questions we raise above—because we simply don’t know what to do about a situation that is becoming increasingly alarming.
Thanks to Björn Brembs for comments on an earlier version.
Pingback: Zuckersüß 445 – Zuckerbäckerei
Pingback: bjoern.brembs.blog » Should you trust Elsevier?
Pingback: Khrys’presso du lundi 26 décembre 2022 – Framablog
Pingback: A stir about Elsevier... - Robin Kok, PhD
Pingback: Corrected JMIR citation style for Mendeley desktop - Robin Kok, PhD
You can choose to not use Mendeley, but as noted in the article it’s very hard to choose not to interact with Elsevier at all, especially for full time academics.
Yes, I highly recommend Zotero and switched from Mendeley about half a year ago, without any problems.
Zotero is free to use. I have always learned that if something is for free, you are the product!
Have you investigated in what Zotero does with your data? They might even sell it to Elsevier.
Good to be careful about free tools that make you the product, but wikipedia (a free product) is a good lesson here, and in this particular case is helpfup to understand why, using free zotero, you are not the product.
Perhaps a partial solution is to ask the company to delete your data every year or every six months, in the same way that some users delete all their social media data on a regular basis.
Pingback: Open science round-up: May 2022 - International Science Council
Pingback: Why moving to diamond open access will not only save money, but also help to protect privacy – Walled Culture
Just to correct a little mistake: I am a researcher and when you say “Academics write articles for Elsevier journals for free and hand over copyright” that’s not exactly true as we (in biology at least), need to pay for them to publish our data.
And the fees can be huge. As an example, I recently published a review with them in a small impact factor journal (IF = 6) and I was charged 3080£ (GBP) (=3600€). If you want to publish in Nature, the fees go up to 9500€. Of note, when you wish to publish in open access (what I did), you usually double (or more) the price of the “classical” publication fee and this is the case for every publisher…
So, in summary, researchers pay for their data to be published (= hand over copyright) and then we (usually through institutions and universities) need to pay to have access to papers. So please, do not pay to read an article, if you write to the authors, they are usually very pleased to send you their article for free!
You’re right! I’ve actually tried to summarize these issues here: https://eiko-fried.com/academia-in-the-upside-down-of-publishing/
Thanks, and what was your conclusion about your trial as an editor?
I wasn’t really very involved with the journal, honestly. Maybe they forgot about me ^^ … but having worked with several journals, I do think that folks can influence the system from within, and that it may be worth doing so for e.g. society journals.
I don’t understand the question, sorry. Can you elaborate?
IANAL, but I have some experience with making (successful) complaints to data protection agencies. A lot of what you write sounds like clearly illegal practice, and certainly unethical practice.
The key difference betweeen legal and illegal is normally whether the data they collect is personally identifiable, that is, while all companies track you and collect your data, the line is crossed as soon as they tie this data to a personal identifier. Or in short, as soon as they can export this data to you, it’s personally tied to you.
For this, explicit consent is required. If you don’t know how you have given consent, you haven’t given consent.
You can also request deletion of all this data, that no more data is being collected, and they have to explain when they don’t delete all your data (it’s not uncommon to get a bullshit explanation much like “needed for editorial tasks”, which is usually easy to challenge). Specifically, you must be able to have an account where they do not collect this personally identifiable data from you.
As long as EU law applies.
I would suggest asking them, and if thes decline, reply unsatisfactorily or ignore you, make a complaint to your data protection agency. Usually, this can be done via an online form.
As for people complaining they are being tracked here, I partially agree: pointing out the great unjustice and possible illegal behaviour of elsevier is not a reason to similarly track people on this page, and maybe looking for a more ethical hosting solution would have been warranted.
Nevertheless, thank you for this very informative article, and I hope this finds more readers.
Thanks, very insightful response. We’ll look into these measures.
In your introduction, you appear to be publishing a PDF with your scientist’s thoughts.
Now, HOW on Earth could you achieve THAT without a “scientific” publisher like Elsevier taking over the most difficult tasks in such an endeavour, like… converting you Wordprocessor output to a PDF, inserting it into a listing of available publications, and ensuring nobody can read it if that doesn’t earn them some money?!
Pingback: Datenkraken – What a wonderful world
Thank you for your report.
Even it might be not be surprising for everyone it is important to remind over and over how data privacy is violated.
Most important point to me was you don’t have the knowledge to discover all off the technical details and that make these nasty business models so dangerous.
We can report only what we see and what we understand.
However, there is much more to discover and people don’t even notice the manipulations on all kind of communication levels.
Maybe you are not manipulated but your trusted friend you are talking to while having a beer.
I like the wise words of Herbert W. Franke:
Man is a being whose actions are largely controlled in the unconscious by the feelings.
We shouldn’t to try to be smarter than the immense machine power. It doesn’t need AI or ML to get manipulated. Big data is all what you need to control peoples’ behavior.
So many people regret to resist to data privacy violation because they don’t have the knowledge, the energy, the time and feel exhausted almost talking about it. Who wants to fight in his spare time against big data companies?
What a hypocrite! This site doesn’t allow me to opt out of tracking cookies and you are trying to farm my email address when I leave comments.
Who are you selling my data to?
This is a small site of an academic. It costs me around 200 euros per year to host and maintain. I don’t make any money by ads or by selling data. And commentators need not leave an email address, so there is certainly no “email farming”.
This site is based on wordpress——I don’t have the technical skills to set up a website from scratch myself. For full transparency, I’ve had a dedicated site on privacy: https://eiko-fried.com/privacy/. If you know of ways to allow opt out of all tracking using wordpress infrastructure, let me know.
Browser addons usually are a good way to opt out even before accessing a site.
Personally I do use uBlock origin and uMatrix.
But those only work client side, it’s nothing a site owner could globally setup.
P.S. @”Anonymous”: You are FAKE news, a blatant liar.
As you show yourself by not using your name (and surely a fake email as well) there is nothing “harvested” here but at best what you freely provide.
OK you win the grand prize for unintentional irony.
I take back the email farming comment. When I tried to comment first I was not able to comment but when I did leave one the comment went through. Probably a glitch in the matrix.
My point about cookies still stands though. Your cookie banner doesn’t allow me to opt out of cookie collection at all. If I were a tin hat wearing conspiracy theorist I would think that it was deliberate. I agree with Some Guy’s comment above. If you want to call out bad practices you need to practice what you preach,
Pingback: Welcome to Hotel Elsevier: you can check-out any time you like not - SLRED - All The Latest News, Tips and Job Post!
Sorry but this is just a wrong-headed criticism of Elsevier. I’m no fan of them either but these are not unusual or egregious practices.
Email open tracking? Good luck subscribing to a newsletter that doesn’t track email opens or link clicks. Even the university all staff newsletter does it.
Tracking your usage of their services? They’d be negligent not to. And retaining your bank details after you gave it to them so they could pay you? Probably required by law.
I’m a big privacy nut – I use ad blockers, Signal, Protonmail and don’t have a Facebook account. But what’s described in this post is not bad – it’s all data they collected directly from you (phone number is often required when opening an author or reviewer account) or from public places on the internet.
If you want privacy from Elsevier, don’t publish with them, read their papers only off sci hub and add their domains to your email block list. But I’m guessing that if getting legal threats from APA wasn’t even enough to stop you signing up to their editorial board, that a bit of email tracking won’t stop your next paper going to the big E.
Refreshing to see disagreements on this, cheers. You expand on some obvious points—newsletters, emails, payment information. Indeed nothing surprising there, and they are required to keep financial data for 7 years, I believe.
When it comes to the tricky parts, such as tracking when I open downloaded software, you write “Tracking your usage of their services? They’d be negligent not to.” That seems like a poor defense of problematic practices to me, and one that can be used to defend basically any practice. Google would be negligent not to track router IPs and passwords with their maps cars despite stating they don’t. WhatApp would be negligent not to have a encryption backdoor despite stating they don’t.
“But I’m guessing that if getting legal threats from APA wasn’t even enough to stop you signing up to their editorial board, that a bit of email tracking won’t stop your next paper going to the big E.”
Sure, I’ll take that. But I do think it’s a little different, because I have some hope that APA / APS can be changed from within, given their scope and mission goals, but big E can’t.
I both agree that tracking is problematic (why I avoid social media) and disagree that Mendeley tracking you is unusual or egregious. If being able to access your library anywhere is part of the offer, that tells you it’s a cloud service. If you’re storing your stuff on their servers then when you use it that’s going to create activity logs. Part of that tracking is legitimate business purposes, from maintaining your account, monitoring for security issues, to improving their app. Part of that might be surveillance capitalism. But you did sign up for it.
To put it another way, it’s like you used my computer to browse the web and now you’re surprised I know what sites you visited. It’s not like they bought your credit file from Equifax and matched it to your publication record and then on sold it to your health insurer.
“Part of that tracking is legitimate business purposes”
No, sorry, but I object!
For example: It is not absolutely not ok if a restaurant would keep track of when I go there, how long I stayed, what I consumed, where I sat with whom etc. Maybe they could do this in an anonymized manner: to keep track of how many people are in there at any given time, so that they can plan their staff hours accordingly. But it would be in no way acceptable, if they kept track of my personal activities.
Same applies to usage of cloud services.
It would be simply way too dangerous, society-wise, if we allowed this. The knowledge of this danger is as old as the 1970ies. You could start with Michel Foucaults theory of the “panopticum” [1, 2] to understand this.
 Michel Foucault: Discipline and Punish: The Birth of the Prison.
 Michel Foucault: The History of Sexuality Vol. 1: The Will to Knowledge
OK JM, but imagine there are 2 restaurants. One of them does not track your movements, let’s call it Zotero Desktop. The other, Mendeley, is a members only restaurant where you scan your card with every order. If I join Mendeley I expect them to have my order history, so it’s not egregious. I would have a problem though if Zotero Desktop had telemetry dialling back to base or if there were no restaurants that provided a more privacy respecting service. With Elsevier, we have choices – we can choose not to publish with them or review for them. Using Mendeley is a very free choice because no one knows or cares what citation manager we use.
This is not Foucault’s panopticon because the prisoners are free to leave at any time. They can even ask the guard to delete the tapes.