Yuxi’s Substack
Subscribe
Sign in
Home
Archive
About
New
Top
Discussion
Study Material for Reinforcement Learning
David Silver, Reinforcement Learning Course (classic) Rich Sutton and Andrew Barto, Reinforcement Learning: An Introduction (standard textbook) OpenAI…
Nov 24
•
Yuxi Li
Share this post
Study Material for Reinforcement Learning
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
Q*, Reinforcement Learning and Search
with applications in LLMs
Nov 24
•
Yuxi Li
3
Share this post
Q*, Reinforcement Learning and Search
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
Will synthetic data help?
A short answer is: it depends. In the following, we discuss 1) ground-truth-in-the-loop, 2) simulation to reality gap, 3) AlphaZero vs ChatGPT, lessons…
Nov 24
•
Yuxi Li
1
Share this post
Will synthetic data help?
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
Levels of AGI & Autonomy
Deepmind published a paper discussing levels of AGI and autonomy. Levels of AGI: Operationalizing Progress on the Path to AGI The authors discuss…
Nov 14
•
Yuxi Li
Share this post
Levels of AGI & Autonomy
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
How would Deepmind Gemini work?
Deepmind will launch Gemini soon, reportedly. How would it work? Let’s see what Deepmind/Google have published? BRET T5, LaMDA, PaLM, Sparrow…
Nov 8
•
Yuxi Li
Share this post
How would Deepmind Gemini work?
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
October 2023
Over-claim then Correct: A New Norm of Research in the Era of LLMs
In the era of LLMs, it is a norm that people make many, bold, general claims. Then, after a little while, people correct them one by one. So many such…
Oct 24
•
Yuxi Li
2
Share this post
Over-claim then Correct: A New Norm of Research in the Era of LLMs
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
September 2023
Blockchains Require Dramatic Innovations to Prosper
Blockchains calls for killer apps Deep learning has revolutionized computer vision and natural language processing (NLP), with applications like face…
Sep 25
•
Yuxi Li
Share this post
Blockchains Require Dramatic Innovations to Prosper
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
Human alignment is very hard
Should physicist J. Robert Oppenheimer has had developed the atomic bomb? Should the nuclear weapons have had been used on zero or one or two cities…
Sep 4
•
Yuxi Li
1
Share this post
Human alignment is very hard
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
August 2023
Agent: What, Why, How.
Agent is a core concept in AI. As Large/Language Models (LMs) become popular, people are talking about building autonomous agents based on LMs…
Aug 31
•
Yuxi Li
3
Share this post
Agent: What, Why, How.
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
July 2023
AI is still (very) vulnerable
Executive summary AI as strong as Go program is still exploitable. LMs are not perfect, so are (easily) exploitable. Even games AI are still exploitable…
Jul 28
•
Yuxi Li
1
Share this post
AI is still (very) vulnerable
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
RL(HF) Helps LMs
Executive summary RLHF is a popular approach for human value alignment, as in ChatGPT. Direct Preference Optimization (DPO) does not need a reward model…
Jul 23
•
Yuxi Li
Share this post
RL(HF) Helps LMs
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
2
Autonomous agent is a BIG bubble
Executive summary Autonomous agent is still an open problem in AI. Agency/planning is a pre-requisite. Agency is achieved by interactions with the…
Jul 23
•
Yuxi Li
Share this post
Autonomous agent is a BIG bubble
yuxili.substack.com
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts