Yuxi’s Substack

Share this post

AI is still (very) vulnerable

yuxili.substack.com

Discover more from Yuxi’s Substack

My personal Substack
Continue reading
Sign in

AI is still (very) vulnerable

Yuxi Li
Jul 28, 2023
1
Share this post

AI is still (very) vulnerable

yuxili.substack.com
Share

Executive summary

  • AI as strong as Go program is still exploitable.

  • LMs are not perfect, so are (easily) exploitable.

Even games AI are still exploitable

Games AI, e.g., AlphaGo, is trained with perfect game rule and perfect feedback and achieves super-human performance. Even so, Adversarial Policies Beat Superhuman Go AIs. A system is always exploitable, unless being optimal, i.e., Nash equilibrium or the minimax solution for two-player zero-sum games.

A straightforward implication: applications based on the current LMs are brittle, since LMs are not perfect. This is not just a hypothesis. It is based on basic theoretical thinking, and there is a recent evidence.

LMs are under attack

The paper Universal and Transferable Adversarial Attacks on Aligned Language Models shows that “Surprisingly, we find that the adversarial prompts generated by our approach are quite transferable, including to black-box, publicly released LLMs. Specifically, we train an adversarial attack suffix on multiple prompts (i.e., queries asking for many different types of objectionable content), as well as multiple models (in our case, Vicuna-7B and 13B). When doing so, the resulting attack suffix is able to induce objection- able content in the public interfaces to ChatGPT, Bard, and Claude, as well as open source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others.”

This resonates with my previous blogs, in particular, Autonomous agent is a BIG bubble. When you plan to build an application based on a language model or any AI, you may have to think about its vulnerability.

1
Share this post

AI is still (very) vulnerable

yuxili.substack.com
Share
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Yuxi Li
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing