Interesting thoughts! Although I'd push back on some of the arguments here.

We can create AGI without first understanding creativity. Understanding the theory is not a prerequisite to creating a practical implementation. Humans can and have created transformational technology without first understanding how it worked. The Wright Brothers created airplanes without understanding the principles of aerodynamics. Fleming created Penicillin, and only decades later did we understand how the underlying mechanics and properties worked.

I'd also argue that the "mix-mashing" of data you see in current models is just an artifact of the models not being large enough to fully represent the subset of reality they're trained on. Fundamentally, LLMs attempt to represent the world in their matrices of numbers. Information Theory shows us how to represent any arbitrary amount of information in bits -- which is exactly what LLMs do.

DALL-E showed us that LLMs are capable of representing Van Gogh's artistic style in their massive matrices. ChatGPT showed us that human language and computer code can be represented accurately enough as vectors for the LLM to produce *and explain* (!!) correct output. In the limit, a large enough model is theoretically capable of representing *and explaining* the nature of reality in its massive arrays of vectors; including language, emotions, mathematics, physics, etc... at which point I'd posit that it's capable of producing novel explanations about reality.

Expand full comment