ArXiv declares independence from Cornell

https://www.science.org/content/article/arxiv-pioneering-preprint-server-declares-independence-cornell

638bookstore-romeo | 14 hours ago | 215 | HN

The recent announcement to reject review articles and position papers already smelled like a shift towards a more "opinionated" stance, and this move smells worse.

The vacuum that arXiv originally filled was one of a glorified PDF hosting service with just enough of a reputation to allow some preprints to be cited in a formally published paper, and with just enough moderation to not devolve into spam and chaos. It has also been instrumental in pushing publishers towards open access (i.e., to finally give up).

Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.

In my view, arXiv fulfills its function better the less power it has as an institution, and I thus have exactly zero trust that the split from Cornell is driven by that function. We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before, and it's now time for the usual routine of snapshotting the site to watch the inevitable amendments to the mission statement.

"What positive changes should users expect to see?" - I guess the negative ones we'll have to see for ourselves.

[1] https://tech.cornell.edu/arxiv/

abdullahkhalids2 hours ago | parent | next

> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right.

This has been a common practice in physics, especially the more theoretical branches, since the inception of arXiv. Senior researchers write a paper draft, and then send copies to some of their peers, get and incorporate feedback, and just submit to arxiv.

loading story #47458329

loading story #47452043

loading story #47453742

loading story #47452449

Aurornis5 hours ago | parent | next

> and with just enough moderation to not devolve into spam and chaos

arXiv has become a target for grifters in other domains like health and supplements. I’ve seen several small scale health influencers who ChatGPT some “papers” and then upload them to arXiv, then cite arXiv as proof of their “published research”. It’s not fooling anyone who knows how research work but it’s very convincing to an average person who thinks that that they’re doing the right thing when they follow sources that have done academic research.

I’ve been surprised as how bad and obviously grifty some of the documents I’ve seen on arXiv have become lately. Is there any moderation, or is it a free for all as long as you can get an invite?

loading story #47458138

loading story #47454492

PaulHoule5 hours ago | parent | next

Review papers are interesting.

Bibliometrics reveal that they are highly cited. Internal data we had at arXiv 20 years ago show they are highly read. Reading review papers is a big part of the way you go from a civilian to an expert with a PhD.

On the other hand, they fall through the cracks of the normal methods of academic evaluation.

They create a lot of value for people but they are not likely to advance your career that much as an academic, certainly not in proportion to the value they create, or at least the value they used to create.

One of the most fun things I did on the way to a PhD was writing a literature review on giant magnetoresistance for the experimentalist on my thesis committee. I went from knowing hardly anything about the topic to writing a summary that taught him a lot he didn't know. Given any random topic in any field you could task me with writing a review paper and I could go out and do a literature search and write up a summary. An expert would probably get some details right that I'd get wrong, might have some insights I'd miss, but it's actually a great job for a beginner, it will teach you the field much more effectively than reading a review paper!

How you regulate review papers is pretty tricky. If it is original research the criterion of "is it original research" is an important limit. There might already be 25 review papers on a topic, but maybe I think they all suck (they might) and I can write the 26th and explain it to people the way I wish it was explained to me.

Now you might say in the arXiv age there was not a limit on pages, but LLMs really do problematize things because they are pretty good at summarization. Send one off on the mission to write a review paper and in some ways they will do better than I do, in other ways will do worse. Plenty of people have no taste or sense of quality and they are going to miss the latter -- hypothetically people could do better as a centaur but I think usually they don't because of that.

One could make the case that LLMs make review papers obsolete since you can always ask one to write a review for you or just have conversations about the literature with them. I know I could have spend a very long time studying the literature on Heart Rate Variability and eventually made up my mind about which of the 20 or so metrics I want to build into my application and I did look at some review papers and can highlight sentences that support my decisions but I made those decisions based on a few weekends of experiments and talking to LLMs. The funny thing is that if you went to a conference and met the guy who wrote the review paper and gave them the hard question of "I can only display one on my consumer-facing HRV app, which one do I show?" they would give you that clear answer that isn't in the review paper and maybe the odds are 70-80% that it will be my answer.

loading story #47454847

loading story #47452731

loading story #47452127

loading story #47452319

whiplash4515 hours ago | parent | next

I'm not sure why we're so focused on filtering what gets into arxiv (which is an uphill battle and DOA at this point) vs fixing the indexing, i.e. the page rank of academia.

Google "sorted out" a messy web with pagerank. Academic papers link to each others. What prevents us from building a ranking from there?

I'm conscious I might be over-simplifying things, but curious to see what I am missing.

muhneesh2 hours ago | parent | next

tangentially related: https://readabstracted.com/

loading story #47455156

loading story #47455006

loading story #47450921

loading story #47454918

loading story #47456744

loading story #47450986

loading story #47454129

loading story #47452409

loading story #47454663

loading story #47450859

asimpleusecase11 hours ago | parent | next

I wonder if there are plans to licence the content for AI training

loading story #47452506

loading story #47451807

loading story #47453176

loading story #47450925

loading story #47451973

loading story #47454814

loading story #47453354

loading story #47451613

loading story #47451420

loading story #47453001

loading story #47455126

loading story #47454465

loading story #47451465

Drblessing5 hours ago | parent | next

ArXiv is dead. Expect a paywall within three years, or other enshittification and slop added.

loading story #47451927

loading story #47450824

loading story #47456455

loading story #47451823

loading story #47452213

loading story #47453791

loading story #47450973

loading story #47452463

loading story #47451014

loading story #47452878

loading story #47451167

#visit	13,196,837
#session	74,665
#live-session	0