Hacker News new | past | comments | ask | show | jobs | submit
The recent announcement to reject review articles and position papers already smelled like a shift towards a more "opinionated" stance, and this move smells worse.

The vacuum that arXiv originally filled was one of a glorified PDF hosting service with just enough of a reputation to allow some preprints to be cited in a formally published paper, and with just enough moderation to not devolve into spam and chaos. It has also been instrumental in pushing publishers towards open access (i.e., to finally give up).

Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.

In my view, arXiv fulfills its function better the less power it has as an institution, and I thus have exactly zero trust that the split from Cornell is driven by that function. We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before, and it's now time for the usual routine of snapshotting the site to watch the inevitable amendments to the mission statement.

"What positive changes should users expect to see?" - I guess the negative ones we'll have to see for ourselves.

[1] https://tech.cornell.edu/arxiv/

> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right.

This has been a common practice in physics, especially the more theoretical branches, since the inception of arXiv. Senior researchers write a paper draft, and then send copies to some of their peers, get and incorporate feedback, and just submit to arxiv.

loading story #47458329
loading story #47452043
loading story #47453742
loading story #47452449
> and with just enough moderation to not devolve into spam and chaos

arXiv has become a target for grifters in other domains like health and supplements. I’ve seen several small scale health influencers who ChatGPT some “papers” and then upload them to arXiv, then cite arXiv as proof of their “published research”. It’s not fooling anyone who knows how research work but it’s very convincing to an average person who thinks that that they’re doing the right thing when they follow sources that have done academic research.

I’ve been surprised as how bad and obviously grifty some of the documents I’ve seen on arXiv have become lately. Is there any moderation, or is it a free for all as long as you can get an invite?

loading story #47458138
loading story #47454492
Review papers are interesting.

Bibliometrics reveal that they are highly cited. Internal data we had at arXiv 20 years ago show they are highly read. Reading review papers is a big part of the way you go from a civilian to an expert with a PhD.

On the other hand, they fall through the cracks of the normal methods of academic evaluation.

They create a lot of value for people but they are not likely to advance your career that much as an academic, certainly not in proportion to the value they create, or at least the value they used to create.

One of the most fun things I did on the way to a PhD was writing a literature review on giant magnetoresistance for the experimentalist on my thesis committee. I went from knowing hardly anything about the topic to writing a summary that taught him a lot he didn't know. Given any random topic in any field you could task me with writing a review paper and I could go out and do a literature search and write up a summary. An expert would probably get some details right that I'd get wrong, might have some insights I'd miss, but it's actually a great job for a beginner, it will teach you the field much more effectively than reading a review paper!

How you regulate review papers is pretty tricky. If it is original research the criterion of "is it original research" is an important limit. There might already be 25 review papers on a topic, but maybe I think they all suck (they might) and I can write the 26th and explain it to people the way I wish it was explained to me.

Now you might say in the arXiv age there was not a limit on pages, but LLMs really do problematize things because they are pretty good at summarization. Send one off on the mission to write a review paper and in some ways they will do better than I do, in other ways will do worse. Plenty of people have no taste or sense of quality and they are going to miss the latter -- hypothetically people could do better as a centaur but I think usually they don't because of that.

One could make the case that LLMs make review papers obsolete since you can always ask one to write a review for you or just have conversations about the literature with them. I know I could have spend a very long time studying the literature on Heart Rate Variability and eventually made up my mind about which of the 20 or so metrics I want to build into my application and I did look at some review papers and can highlight sentences that support my decisions but I made those decisions based on a few weekends of experiments and talking to LLMs. The funny thing is that if you went to a conference and met the guy who wrote the review paper and gave them the hard question of "I can only display one on my consumer-facing HRV app, which one do I show?" they would give you that clear answer that isn't in the review paper and maybe the odds are 70-80% that it will be my answer.

loading story #47454847
loading story #47452731
loading story #47452127
loading story #47452319
I'm not sure why we're so focused on filtering what gets into arxiv (which is an uphill battle and DOA at this point) vs fixing the indexing, i.e. the page rank of academia.

Google "sorted out" a messy web with pagerank. Academic papers link to each others. What prevents us from building a ranking from there?

I'm conscious I might be over-simplifying things, but curious to see what I am missing.

loading story #47455156
loading story #47455006
loading story #47450921
loading story #47454918
loading story #47456744
loading story #47450986
loading story #47454129
loading story #47452409
loading story #47454663
loading story #47450859
I wonder if there are plans to licence the content for AI training
loading story #47452506
loading story #47451807
loading story #47453176
loading story #47450925
loading story #47451973
loading story #47454814
loading story #47453354
loading story #47451613
loading story #47451420
loading story #47453001
loading story #47455126
loading story #47454465
loading story #47451465
ArXiv is dead. Expect a paywall within three years, or other enshittification and slop added.
loading story #47451927
loading story #47450824
loading story #47456455
loading story #47451823
loading story #47452213
loading story #47453791
loading story #47450973
loading story #47452463
loading story #47451014
loading story #47452878
loading story #47451167