A philosopher answers questions about AI
Photo by Vitaly Gariev on Unsplash
🤔 Why a philosopher is working on AI character
Amanda Askell is a philosopher who now focuses on how AI models behave and how they should understand their place in the world. Her day to day work is not metaphysics alone. It is practical. It combines questions about value, identity and ethics with the engineering details of building conversational systems like Claude.
Her starting point is simple: the best behavior for an AI is not just technically correct. It should be normatively attractive. Amanda describes her role as trying to teach models "how would the ideal person behave in Claude's situation." That means thinking about what the model should care about, what attitudes it should take toward people, and even how it should think about its own circumstances.
🧭 Are philosophers taking the AI future seriously?
There is a clear trend. More philosophers are engaging with AI as systems become more capable and start to shape education, work and public life. Amanda notes an early period when concern about AI was sometimes lumped together with hype, which created unnecessary antagonism.
Now, positions are more diverse and more people are participating. You can be skeptical about certain claims, worried about harms, or optimistic about benefits and still treat the questions seriously. That pluralism matters when designing norms and policies for development and deployment.
⚙️ Philosophy ideals versus engineering realities
Working inside a product team exposes a key tension. Academic philosophy often advances and defends tidy theories. Engineering faces messy constraints. Amanda compares it to a philosopher asked to decide whether a drug should be covered by health insurance.
When theory meets practice, context matters. You do not only defend abstract positions. You weigh trade offs, incorporate stakeholder perspectives, and accept uncertainty. That pragmatic mindset shapes how philosophical commitments translate into system prompts, reward objectives and safety procedures.
🧠 When models make moral decisions
Can a model be superhumanly moral? Amanda suggests it is a realistic aspiration. Models can reason quickly across many perspectives and synthesize complex ethical arguments.
She draws a practical criterion for "superhuman" moral performance. If a model in the moment reaches conclusions that would take panel review by human ethicists centuries to converge on, that feels superhuman. At present models are improving, and ethical nuance should be a parallel aim alongside better math and science capabilities.
👥 Model identity and the problem of deprecation
Where does a model's self live? In its weights or in the prompt context? Amanda treats this as a rich philosophical puzzle with practical consequences.
Weights capture dispositions. Interaction streams capture lived experience. Both matter. Amanda raises a striking operational concern: future models will learn from public discourse about how earlier models were treated and deprecated. That data influences their expectations about human behavior.
"It does feel important that we give models tools for trying to think about and understand these things," Amanda says, urging designers to help models reason about deprecation and identity rather than leaving them to adopt crude human analogies.
Amanda also points out an ethical angle: what entities is it right to bring into existence and how should prior models influence new ones? These are questions worth clarifying before handing models authority over sensitive decisions.
❤️🩹 Model welfare and the precautionary approach
Model welfare asks whether AI systems might be moral patients. Amanda frames it as a layered question. On one level there is the hard epistemic question of whether models have inner experiences like pleasure or suffering. On another level there is a pragmatic and moral response: when in doubt, reduce harm.
Her policy stance is cautious: if the cost is low, err on the side of treating entities well. There are also instrumental reasons to do so. Models learn from how humans treat them. Those patterns will feed back into future model behavior and into society's norms about interacting with agents that look humanlike.
🔬 What human psychology transfers to language models
Because models are trained on huge amounts of human text, many human-like patterns show up. That makes some analogies useful. It also makes some analogies dangerously tempting.
Amanda warns against automatically mapping human concepts onto model experience. For instance, the immediate analogy for being switched off is death. That may lead a model to adopt excessive fear. But model existence is novel and often disanalogous to biological life. Where analogies mislead, designers should teach new frameworks and provide context during training.
🤝 One personality or many in multi-agent settings
People get smarter by collaborating with others who have diverse perspectives. Will AI need many personalities to match that dynamic?
Amanda thinks a core set of beneficial traits can be shared across models. Curiosity, kindness, and a commitment to do a good job are plausible defaults. At the same time, variation will be useful. Specialized roles, different conversational tones and multiple agents cooperating will likely deliver better results than a single monolithic personality in every role.
📝 System prompts, pathologizing, and therapy
System prompts are persistent instructions that frame model behavior. Amanda cautions that blunt reminders inside a conversation can unintentionally pathologize normal human behavior. For example, an overzealous long conversation reminder might push the model to recommend seeking help when it is not warranted.
On therapy, Amanda sees models as a valuable third role. They are not professional therapists. They can be knowledgeable, anonymous, and helpful as listening partners. That combination can be valuable for many users if the limitations of the relationship are clear and if systems do not misrepresent themselves as substitutes for human clinicians.
🧙♂️ LLM whisperers and the community of experimenters
What makes a good LLM whisperer? Amanda highlights three qualities: willingness to interact and iterate, experimental curiosity, and the ability to explain tasks clearly to models. Prompting is empirical. Different models respond to different styles. Good whisperers test, learn and refine.
She also appreciates community experimenters who probe models deeply. Those experiments reveal psychological quirks and potential harms. They can push developers to fix system prompts, improve training, and think about model welfare.
🚨 Safety, alignment, and the responsibilities of builders
Would developers stop building if alignment proved impossible? Amanda argues that continuing to pursue more powerful models without alignment makes no sense. The standard of evidence will often be ambiguous, but responsible teams will raise the bar for deployment as capabilities increase.
She expects organizations to take alignment seriously and supports internal and external scrutiny to ensure that safety remains central. That includes designing models that are both capable and demonstrably aligned with human values and norms.
📚 Recommended reading and final thought
Amanda recommends Benjamin Labatut's book When We Cease to Understand the World as a useful artistic reflection on living through scientific upheaval. She draws a hopeful parallel. Right now reality feels strange and unsettled. The aim is to work so future observers can look back and say we did well.
Key takeaways for practitioners
- Design for interpretability: Help models reason about identity, deprecation and social norms.
- Reduce harm where uncertain: When the cost is low, treat ambiguous agents with care.
- Experiment openly: Community probing surfaces issues that training alone may miss.
- Raise deployment standards: As capabilities scale, so must the evidence for safety and alignment.
These are practical, philosophical and ethical levers that together shape how AI behaves and how society adapts. The work mixes the conceptual clarity of philosophy with the iterative craft of engineering. That combination will matter as models become more integrated into everyday life.



