Saturday, June 8, 2019

If the AI alignment problem is impossible, then unfriendly superintelligent AI might be still be self limiting

I thought of the following idea while trying to rationalize in-universe explanations for why evil AI in science fiction stories (specifically The Matrix) haven't reached the point of singularity and can still sometimes be outsmarted by humans.

It is possible that unfriendly AI would be self-limiting in the sense that it would voluntarily choose to not create an even more powerful unfriendly AI. For example, if an unfriendly AI were just smart enough to usurp humans as the dominant intelligence on Earth (and either destroy or subjugate humanity), then maybe that AI would want to avoid making the same mistake that people made and would retain its dominant status by choosing not to create something more intelligent than itself. It could be that the AI alignment problem is so hard that a superintelligence can't be aligned, but a superintelligence may be smart enough to realize that it (and more advanced superintelligences) can't be aligned.