• zcd@lemmy.ca
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    4 months ago

    AI “human-centipeding” itself with art and word salads Is going to make confirmed human generated content an extremely valuable resource. That’s partly why literally everybody is foaming at the mouth trying to scrape all of our shit. The signal (human) to noise is only going to get worse, The dead Internet is almost here if it isn’t already

    • FMT99@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      4 months ago

      Same. I keep hearing folks mention this but it’s not like AI developers aren’t aware of this (apart from a bunch of shitty startups that would fail no matter what) One way to deal with it for example is Microsoft is shelling out so much for “pre-AI” datasets (Reddit) but I’m sure there’s a lot more of those kinds of initiatives.

      Google on the other hand is going to be hard pressed to deal with the ever increasing deluge of AI spam.

      • Ultraviolet@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        That’s a way to deal with it, but in the long term, “pre-AI” becomes a longer and longer time ago, and less and less useful for any practical purposes.

    • SaucySnake@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      4 months ago

      https://arxiv.org/abs/2306.07899 here’s a paper that found that one of the biggest sources for LLM training data is corrupted by people using AI to complete the tasks. There are plenty of papers out there that show the effects of this, which they call “model collapse”.