Hacker News new | past | comments | ask | show | jobs | submit
The recent announcement to reject review articles and position papers already smelled like a shift towards a more "opinionated" stance, and this move smells worse.

The vacuum that arXiv originally filled was one of a glorified PDF hosting service with just enough of a reputation to allow some preprints to be cited in a formally published paper, and with just enough moderation to not devolve into spam and chaos. It has also been instrumental in pushing publishers towards open access (i.e., to finally give up).

Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.

In my view, arXiv fulfills its function better the less power it has as an institution, and I thus have exactly zero trust that the split from Cornell is driven by that function. We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before, and it's now time for the usual routine of snapshotting the site to watch the inevitable amendments to the mission statement.

"What positive changes should users expect to see?" - I guess the negative ones we'll have to see for ourselves.

[1] https://tech.cornell.edu/arxiv/

> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right.

This has been a common practice in physics, especially the more theoretical branches, since the inception of arXiv. Senior researchers write a paper draft, and then send copies to some of their peers, get and incorporate feedback, and just submit to arxiv.

And this is really how it should be. Honestly the only thing I want arxiv to do is become more like open review. Allow comments by peers and some better linking to data and project pages.

It works for physics because physicists are very rigorous. So papers don't change very much. It also works for ML because everyone is moving very fast that it's closer to doing open research. Sloppier, but as long as the readers are other experts then it's generally fine.

I think research should really just be open. It helps everyone. The AI slop and mass publishing is exploiting our laziness; evaluating people on quantity rather than quality. I'm not sure why people are so resistant to making this change. Yes, it's harder, but it has a lot of benefits. And at the end of the day it doesn't matter if a paper is generated if it's actually a quality paper (not in just how it reads, but the actual research). Slop is slop and we shouldn't want slop regardless. But if we evaluate on quality and everything is open it becomes much easier to figure out who is producing slop, collision rings, plagiarist rings, and all that. A little extra work for a lot of benefits. But we seem to be willing to put in a lot of work to avoid doing more work

I came here to say something similar. As someone who works in a field that applies machine learning but is not purely focused on it, I interact with people who think that arXiv is the only relevant platform and that they don't need to submit their work to any journal, as well as people who still think that preprints don't count at all and that data isn't published until it's printed in an academic journal. It can feel like a clash of worlds.

I think both sides could learn from the other. In the case of ML, I understand the desire to move fast and that average time to publication of 250-300 days in some of the top-tier journals can feel like an unnecessary burden. But having been on both sides of peer review, there is value to the system and it has made for better work.

Not doing any of it follows the same spirit as not benchmarking your approach against more than maybe one alternative and that already as an after-thought. Or benchmaxxing but not exploring the actual real-world consequences, time and cost trade offs, etc.

Now, is academic publishing perfect? Of course not, very very far from it. It desperately needs to be reformed to keep it economically accessible, time efficient for both authors, editors and peer reviewers and to prevent the "hot topic of the day" from dominating journals and making sure that peer review aligns with the needs of the community and actually improves the quality of the work, rather than having "malicious peer review" to get some citations or pet peeves in.

Given the power that the ML field holds and the interesting experiments with open review, I would wish for the field to engage more with the scientific system at large and perhaps try to drive reforms and improve it, rather than completely abandoning it and treating a PDF hosting service as a journal (ofc, preprints would still be desirable and are important, but they can not carry the entire field alone).

loading story #47452358
loading story #47457356
I've noticed it's field dependent. Some fields don't really feel much need to publish in a real journal.

Others (at least in chemistry) will accept it, but it raises concern if a paper is only available as a preprint.

> arXiv fulfills its function better the less power it has as an institution

It is an interesting instance of the rule of least power, https://en.wikipedia.org/wiki/Rule_of_least_power.

loading story #47453186
> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, ...

In my experience as a publishing scientist, this is partly because publishing with "reputable" journals is an increasingly onerous process, with exorbitant fees, enshittified UIs, and useless reviews. The alternative is to upload to arXiv and move on with your life.

loading story #47454000
> and with just enough moderation to not devolve into spam and chaos

arXiv has become a target for grifters in other domains like health and supplements. I’ve seen several small scale health influencers who ChatGPT some “papers” and then upload them to arXiv, then cite arXiv as proof of their “published research”. It’s not fooling anyone who knows how research work but it’s very convincing to an average person who thinks that that they’re doing the right thing when they follow sources that have done academic research.

I’ve been surprised as how bad and obviously grifty some of the documents I’ve seen on arXiv have become lately. Is there any moderation, or is it a free for all as long as you can get an invite?

{"deleted":true,"id":47458138,"parent":47454243,"time":1774029133,"type":"comment"}
This is great news for anyone building tools on top of arXiv data. The API (export.arxiv.org/api/) is one of the best free academic data sources — structured Atom feed with full abstracts, authors, categories, and publication dates.

I've been using it as one of 9 data sources in a market research tool — arXiv papers are a strong leading indicator of where an industry is heading. Academic research today often becomes commercial products in 2-3 years.

Review papers are interesting.

Bibliometrics reveal that they are highly cited. Internal data we had at arXiv 20 years ago show they are highly read. Reading review papers is a big part of the way you go from a civilian to an expert with a PhD.

On the other hand, they fall through the cracks of the normal methods of academic evaluation.

They create a lot of value for people but they are not likely to advance your career that much as an academic, certainly not in proportion to the value they create, or at least the value they used to create.

One of the most fun things I did on the way to a PhD was writing a literature review on giant magnetoresistance for the experimentalist on my thesis committee. I went from knowing hardly anything about the topic to writing a summary that taught him a lot he didn't know. Given any random topic in any field you could task me with writing a review paper and I could go out and do a literature search and write up a summary. An expert would probably get some details right that I'd get wrong, might have some insights I'd miss, but it's actually a great job for a beginner, it will teach you the field much more effectively than reading a review paper!

How you regulate review papers is pretty tricky. If it is original research the criterion of "is it original research" is an important limit. There might already be 25 review papers on a topic, but maybe I think they all suck (they might) and I can write the 26th and explain it to people the way I wish it was explained to me.

Now you might say in the arXiv age there was not a limit on pages, but LLMs really do problematize things because they are pretty good at summarization. Send one off on the mission to write a review paper and in some ways they will do better than I do, in other ways will do worse. Plenty of people have no taste or sense of quality and they are going to miss the latter -- hypothetically people could do better as a centaur but I think usually they don't because of that.

One could make the case that LLMs make review papers obsolete since you can always ask one to write a review for you or just have conversations about the literature with them. I know I could have spend a very long time studying the literature on Heart Rate Variability and eventually made up my mind about which of the 20 or so metrics I want to build into my application and I did look at some review papers and can highlight sentences that support my decisions but I made those decisions based on a few weekends of experiments and talking to LLMs. The funny thing is that if you went to a conference and met the guy who wrote the review paper and gave them the hard question of "I can only display one on my consumer-facing HRV app, which one do I show?" they would give you that clear answer that isn't in the review paper and maybe the odds are 70-80% that it will be my answer.

I exited academia for industry 15 years ago, and since then I haven't had nearly as much time to read review papers as I would like. For that reason, my view may be a bit outdated, but one thing I remember finding incredibly useful about review papers is that they provided a venue for speculation.

In the typical "experimental report" sort of paper, the focus is typically narrowed to a knifes edge around the hypothesis, the methods, the results, and analysis. Yes, there is the "Introduction" and a "Discussion", but increasingly I saw "Introductions" become a venue to do citation bartering (I'll cite your paper in the intro to my next paper if you cite that paper in the intro to your next paper) and "Discussion" turn into a place to float your next grant proposal before formal scoring.

Review papers, on the other hand, were more open to speculation. I remember reading a number that were framed as "here's what has been reported, here's what that likely means...and here's where I think the field could push forward in meaningful ways". Since the veracity of a review is generally judged on how well it covers and summarizes what's already been reported, and since no one is getting their next grant from a review, there's more space for the author to bring in their own thoughts and opinions.

I agree that LLMs have largely removed the need for review papers as a reference for the current state of a field...but I'll miss the forward-looking speculation.

Science is staring down the barrel of a looming crisis that looks like an echo chamber of epic proportions, and the only way out is to figure out how to motivate reporting negative results and sharing speculative outsider thinking.

My feelings about that outsider thing are pretty mixed.

On one hand I'm the person who implemented the endorsement system for arXiv. I also got a PhD in physics did a postdoc in physics then left the field. I can't say that I was mistreated, but I saw one of the stars of the field today crying every night when he was a postdoc because he was so dedicated to his work and the job market was so brutal -- so I can say it really hurts when I see something that I think belittles that.

On the other hand I am very much an interested outsider when it comes to biosignals, space ISRU, climate change, synthetic biology and all sorts of things. With my startup and hackathon experience it is routine for me to go look at a lot of literature in a new field and cook it down and realize things are a lot simpler than they look and build a demo that knocks the socks off the postdocs because... that's what I do.

But Riemann Hypothesis, Collatz, dropping names of anyone who wrote a popular book, I don't do that. What drives me nuts about crackpots is that they are all interested in the same things whereas real scientists are interested in something different. [1] It was a big part of our thinking about arXiv -- crackpot submissions were a tiny fraction of submission to arXiv but they would have been half the submissions to certain fields like quantum gravity.

I've sat around campfires where hippies were passing a spliff around and talking about that kind of stuff and was really amused recently when we found out that Epstein did the thing with professors who would have known better -- I mean, I will use my seduction toolbox to get people like that to say more than they should but not to have the same conversation I could have at a music festival.

[1] e.g. I think Tolstoy got it backwards!

> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.

This just isn't true. arXiv is not a venue. There's no place that gives you credit for arXiv papers. No one cares if you cite an arXiv paper or some random website. The vast vast majority of papers that have any kind of attention or citations are published in another venue.

loading story #47453118
My observation is that research, especially in AI has left universities, which are now focusing their research to a lesser degree on STEM. It appears research is now done by companies like Meta, OpenAI, Anthropic, Tencent, Alibaba, among many others.
loading story #47452373
loading story #47454207