Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · edit-2 5 days ago

I’m not sure how you’d solve the problem of big corpos becoming cheap content farms while avoiding harming the people who use these tools to make something rich and beautiful, but I have to believe there’s a way to thread that needle.

Easy, local AI.

Keep generative AI locally runnable instead of corporate hosted. Make it free, open and accessible. This gives the little guys the cost advantage, and takes away the scaling advantages of mega publishers. Lemmy users should be familiar with this concept.

Whenever I hear people rail against AI, I tell them they are handing the world to Sam Altman and his dystopia, who do not care about stealing content, equality, or them. I get a lot of hate for it. But they need to be fighting the corporate vs open AI battle instead.

brucethemoose@lemmy.world · edit-2 7 days ago

Sounds like it’d be nice if you had real control over the car’s software, and you could roll it back.

This… also makes me a little more weary driving around Teslas in traffic.

brucethemoose@lemmy.world · edit-2 10 days ago

The localllama people are feeling quite mixed about this, as they’re still charging through the nose for more RAM. Like, orders of magnitude more than the bigger ICs actually cost.

It’s kinda poetic. Apple wants to go all in on self-hosted AI now, yet their incredible RAM stinginess over the years is derailing that.

brucethemoose@lemmy.world · edit-2 11 days ago

There is a breaking point, eventually. YouTube’s trajectory is gonna make next quarter’s revenue great, but eventually something else will pick up user’s attention instead.

brucethemoose@lemmy.world · 11 days ago

I don’t even look at the algo anymore, I just go out and search for content externally.

brucethemoose@lemmy.world · edit-2 11 days ago

Maybe I am just out of touch, but I smell another bubble bursting when I look at how enshittified all major web services are simultaneously becoming.

It feels like something has to give, right?

We have YouTube, Reddit, Twitter, and more just racing to enshittify like I can’t even believe, Google Search is racing to destroy the internet, yet they’re also at the ‘critical mass’ of ‘too big to fail’ and shoved out all their major competitors already (other than Discord I guess).

brucethemoose@lemmy.world · 11 days ago

There are already open source/self hosted alternatives, like Perplexica.

brucethemoose@lemmy.world · 11 days ago

brucethemoose@lemmy.world · 11 days ago

CEO Tony Stubblebine says it “doesn’t matter” as long as ~~nobody reads it.~~ they keep generating sign-ups and selling ads… till next quarter, at least.

brucethemoose@lemmy.world · edit-2 12 days ago

Soldered is better! It’s sometimes faster, definitely faster if it happens to be lpddr.

But TBH the only thing that really matters his “how much VRAM do you have,” and Qwen 32B slots in at 24GB, or maybe 16GB if the GPU is totally empty and you tune your quantization carefully. And the cheapest way to that (until 2025) is a used MI60, P40 or 3090.

brucethemoose@lemmy.world · 12 days ago

TSMC doesn’t really have official opinions, they take silicon orders for money and shrug happily. Being neutral is good for business.

Altman’s scheme is just a whole other level of crazy though.

brucethemoose@lemmy.world · edit-2 12 days ago

It’s useful.

I keep Qwen 32B loaded on my desktop pretty much whenever its on, as an (unreliable) assistant to analyze or parse big texts, to do quick chores or write scripts, to bounce ideas off of or even as a offline replacement for google translate (though I specifically use aya 32B for that).

It does “feel” different when the LLM is local, as you can manipulate the prompt syntax so easily, hammer it with multiple requests that come back really fast when it seems to get something wrong, not worry about refusals or data leakage and such.

brucethemoose@lemmy.world · 12 days ago

the model seems ok for tasks like summarisation though

That and retrieval and the business use cases so far, but even then only if the results can be wrong somewhat frequently.

brucethemoose@lemmy.world · edit-2 12 days ago

the term AI will become every bit as radioactive to investors in the future as it is lucrative right now.

Well you say that, but somehow crypto is still around despite most schemes being (IMO) a much more explicit scam. We have politicans supporting it.

brucethemoose@lemmy.world · edit-2 12 days ago

Current LLMs cannot be AGI, no matter how big they are. The fundamental architecture just isn’t right.

brucethemoose@lemmy.world · 12 days ago

It’s selling an anticompetitive dystopia. It’s selling a Facebook monopoly vs selling the Fediverse.

We dont need 7 trillion dollars of datacenters burning the Earth, we need collaborative, open source innovation.

brucethemoose@lemmy.world · 12 days ago

https://web.archive.org/web/20240930204245/https://www.nytimes.com/2024/09/25/business/openai-plan-electricity.html

When Mr. Altman visited TSMC’s headquarters in Taiwan shortly after he started his fund-raising effort, he told its executives that it would take $7 trillion and many years to build 36 semiconductor plants and additional data centers to fulfill his vision, two people briefed on the conversation said. It was his first visit to one of the multibillion-dollar plants.

TSMC’s executives found the idea so absurd that they took to calling Mr. Altman a “podcasting bro,” one of these people said. Adding just a few more chip-making plants, much less 36, was incredibly risky because of the money involved.

brucethemoose@lemmy.world · 12 days ago

Well there is a very specific architecture “rut” the LLMs people use have fallen into, and even small attempts to break out (like with Jamba) don’t seem to get much interest, unfortunately.

brucethemoose@lemmy.world · edit-2 12 days ago

Its great at brainstorming, fiction making, a unreliable intern-like but very fast assistant and so on… but none of that is very profitbable.

Hence you get OpenAI and such trying to sell it as an omiscient chatbot and (most profitably) an employee replacement.

brucethemoose@lemmy.world · 12 days ago

I mean… it is machine learning.

brucethemoose@lemmy.world · edit-2 26 days ago

Guide to Self Hosting LLMs Faster/Better than Ollama