Differentiating between AGI and non-AGI, if we ever get remotely close, would be challenging, but for now it's trivial. The defining feature of AGI is recursive self improvement across any field. Without self improvement, you're just regurgitating. Humanity started with no advanced knowledge or even a language. In what should practically be a heartbeat at the speed of distributed computing with perfect memory and computation power, we were landing a man on the Moon.
So one fundamental difference is that AGI would not need some absurdly massive data dump to become intelligent. In fact you would prefer to feed it as minimal a series of the most primitive first principles as possible because it's certain that much of what we think is true is going to end up being not quite so -- the same as for humanity at any other given moment in time.
We could derive more basic principles, but this one is fundamental and already completely incompatible with our current direction. Right now we're trying to essentially train on the entire corpus of human writing. That is a defacto acknowledgement that the absolute endgame for current tech is simple mimicry, mistakes and all. It'd create a facsimile of impressive intelligence because no human would have a remotely comparable knowledge base, but it'd basically just be a glorified natural language search engine - frozen in time.
I mostly agree with you. But if you think about it mimicry is an aspect of intelligence. If I can copy you and do what you do reliably, regardless of the method used, it does capture an aspect of intelligence. The true game changer is a reflective AI that can automatically improve upon itself
Your quote is a non sequitur to your question. The reason you want to avoid massive data dumps is because there are guaranteed to be errors and flaws. See things like Alpha Go vs Alpha Go Zero. The former was trained on the entirety of human knowledge, the latter was trained entirely on itself.
The zero training version not only ended up dramatically outperforming the 'expert' version, but reached higher levels of competence exponentially faster. And that should be entirely expected. There were obviously tremendous flaws in our understanding of the game, and training on those flaws resulted in software seemingly permanently handicapping itself.
Minimal expert training also has other benefits. The obvious one is that you don't require anywhere near the material and it also enables one to ensure you're on the right track. Seeing software 'invent' fundamental arithmetic is somewhat easier to verify and follow than it producing a hundred page proof advancing, in a novel way, some esoteric edge theory of mathematics. Presumably it would also require orders of magnitude less operational time to achieve such breakthroughs, especially given the reduction in preexisting state.
The moment after human birth the human agent starts a massive information gathering process - that no other system really expects much output from in a coherent way - for 5-10 years. Aka “data dump” some of that data is good, and some of it is bad. This in turn leads to biases, it leads to poor thinking models; everything that you described, is also applicable to every intelligent system - including humans. So again you presupposing that there’s some kind of perfect information benchmark that couldn’t exist.
When that system comes out of the birth canal it already has embedded in it millions of years of encoded expectations predictability systems and functional capabilities that are going to grow independent of what the environment does (but will be certainly shaped in its interactions by the environment).
So no matter what, you have a structured system of interaction that must be loaded with previously encoded data (experience, transfer learning etc) with and it doesn’t matter what type of intelligent system you’re talking about there are foundational assumptions at the physical interaction layer that encode all previous times steps of evolution.
Said an easier way: a lobster, because of the encoded DNA that created it, will never have the same capabilities as a human, because it is structured to process information completely differently and their actuators don’t have the same type and level of granularity as human actuators.
Now assume that you are a lobster compared to a theoretical AGI in sensor-effector combination. Most likely it would be structured entirely differently than you are as a biological thing - but the mere design itself carries with it an encoding of structural information of all previous systems that made it possible.
So by your definition you’re describing something that has never been seen in any system and includes a lot of assumptions about how alternative intelligent systems could work - which is fair because I asked your opinion.
With due respect I do not think you're tackling the fundamental issue, which I do not think is particularly controversial: intelligence and knowledge are distinct things, with the latter created by the former. What we're aiming to do is to create an intelligent system, a system that can create fundamentally new knowledge, and not simply reproduce or remix it on demand.
The next time your in the wilds, it's quite amazing to consider that your ancestors - millennia past, would have looked at, more or less, these exact same wilds but with so much less knowledge. Yet nonetheless they would discover such knowledge - teaching themselves, and ourselves, to build rockets, put a man on the Moon, unlock the secrets of the atom, and so much more. All from zero.
---
What your example and elaboration focus on is the nature of intelligence, and the difficulty in replicating it. And I agree. This is precisely we want to avoid making the problem infinitely more difficult, costly, and time consuming by dumping endless amounts of knowledge in the equation.
Intelligence and knowledge being different things is quite the claim - namely it sounds like you’re stuck in the Cartesian dualist world and having transitioned into statistical empiricism.
I’m curious what epistemological grounding you are basing your claim on
I don't understand how you can equate the two and reconcile the past. The individuals who have pushed society forward in this domain or that scarcely, if ever, had any particular knowledge edge. Cases like Ramanujan [1] exemplify such to the point of absurdity.
If you took the average human from birth and gave them only 'the most primitive first principles', the chance that they would have novel insights into medicine is doubtful.
I also disagree with your following statement:
> Right now we're trying to essentially train on the entire corpus of human writing. That is a defacto acknowledgement that the absolute endgame for current tech is simple mimicry
At worst it's complex mimicry! But I would also say that mimicry is part of intelligence in general and part of how humans discover. It's also easy to see that AI can learn things - you can teach an AI a novel language by feeding in a fairly small amount of words and grammar of example text into context.
I also disagree with this statement:
> One fundamental difference is that AGI would not need some absurdly massive data dump to become intelligent
I don't think how something became intelligent should affect whether it is intelligent or not. These are two different questions.
> you can teach an AI a novel language by feeding in a fairly small amount of words and grammar of example text into context.
You didn't teach it, the model is still the same after you ran that. That is the same as a human following instructions without internalizing the knowledge, he forgets it afterward and didn't learn what he performed. If that was all humans did then there would be no point in school etc, but humans do so much more than that.
As long as LLM are like an Alzheimer's human they will never become a general intelligence. And following instructions is not learning at all, learning is building an internal model for those instructions that is more efficient and general than the instructions themselves, humans do that and that is how we manage to advance science and knowledge.
It depends what you count as learning - you told it something, and it then applied that new knowledge, and if you come back to that conversation in 10 years, it will still have that new knowledge and be able to use it.
Then when OpenAI does another training run it can also internalise that knowledge into the weights.
This is much like humans - we have short term memory (where it doesn't get into the internal model) and then things get baked into long term memory during sleep. AI's have context-level memory, and then that learning gets baked into the model during additional training.
Although whether or not it changed the weights IMO is not a prerequisite for whether something can learn something or not. I think we should be able to evaluate if something can learn by looking at it as a black-box, and we could make a black-box which would meet this definition if you spoke to a LLM and limited it to it's max context length each day, and then ran an overnight training run to incorporate learned knowledge into weights.
So one fundamental difference is that AGI would not need some absurdly massive data dump to become intelligent. In fact you would prefer to feed it as minimal a series of the most primitive first principles as possible because it's certain that much of what we think is true is going to end up being not quite so -- the same as for humanity at any other given moment in time.
We could derive more basic principles, but this one is fundamental and already completely incompatible with our current direction. Right now we're trying to essentially train on the entire corpus of human writing. That is a defacto acknowledgement that the absolute endgame for current tech is simple mimicry, mistakes and all. It'd create a facsimile of impressive intelligence because no human would have a remotely comparable knowledge base, but it'd basically just be a glorified natural language search engine - frozen in time.