Built for SF by SF: AI Solutions Helping Our City Thrive

I watched the original presentation from OpenAI that brought together civic leaders, students, and local founders to showcase how artificial intelligence is already helping San Francisco run better. In this report I summarize the session, highlight the projects presented, and unpack what their approaches mean for city services, community care, and urban planning. I write from the perspective of someone who was onstage and in the room—observing, asking questions, and thinking about what it takes to move from prototype to public impact.

🗣️ Opening remarks and a civic challenge

The event began with an energetic civic framing: San Francisco is both birthplace and testing ground for new technology. I heard that message repeated as a core thesis of the morning—AI isn't an abstract lab project; it's being built here and for here to solve real, everyday problems.

Mayor Daniel Lurie opened the session with a clear line between innovation and public service. He said,

"San Francisco is a city that creates the future."

That assertion shaped the tone for the projects that followed. The mayor described Solve SF, an app that makes reporting city maintenance issues easy by letting residents take a photo and having AI do the rest. In his remarks he emphasized a practical civic outcome: "Being able to make visible changes for our residents makes them feel like their government is working for them."

I appreciated how the mayor linked technology to outcomes residents care about—safety, responsiveness, and neighborhood vibrancy. He also stressed partnership: the city worked with the Solve SF team to make sure their app could interface properly with municipal APIs. That sort of collaboration is the backbone of civic tech that actually ships and scales.

📸 Solve SF — two clicks from photo to 311 report

I spoke with Patrick McCabe, the creator of Solve SF, and learned about a deceptively simple idea executed with modern AI: shorten the path from noticing a problem on the street to filing a complete 311 report.

Patrick's story began on daily neighborhood walks—what he called "log walks"—where he would see graffiti or trash and decide to report it. He found the city's official 311 app functional but tedious: some reports required a dozen steps, choices that were confusing or error-prone. So he set out to make reporting as effortless as taking a photo.

Key features of Solve SF:

Camera-first workflow: An action button opens the app directly into a camera view. Snap a photo, slide to submit, done.
Image classification: An onboard classifier identifies the category of issue (graffiti, trash, etc.).
Automated field mapping: GPT-5 (used on the backend) determines the appropriate municipal form, whether the issue is on public or private property, the object type (sidewalk, trash can), and even whether the content is offensive to prioritize response.
Integration with city APIs: The app submits structured reports to the city's 311 system so public crews can respond and, in many cases, send back evidence of completion.

Patrick emphasized that he doesn't come from a traditional software developer background—he's an electrical engineer who learned to build mobile apps using ChatGPT as a coding assistant. He used ChatGPT to write Swift and Kotlin code, to set up an AWS backend, and to integrate GPT-5 for image interpretation. He candidly said he still turns to ChatGPT to brainstorm features and design ideas.

Why this matters: reducing friction increases participation. Patrick's central hypothesis—that people will report more issues if reporting is quicker and easier—aligns with behavioral research about micro-commitments and friction. By compressing the 12-step process into two clicks and delegating clerical judgment to AI, Solve SF increases civic engagement while standardizing reports for faster municipal action.

Outcomes and evidence

Solve SF has already generated measurable activity: over 100,000 reports have been submitted through the app. The city has used those reports to pressure wash sidewalks, paint over graffiti, and pick up large volumes of trash. The app's ability to attach a photo at submission time and receive confirmation from crews closes the loop for residents, reinforcing the sense that reporting "worked."

Technology applied

On the tech stack Patrick described, the core ingredients are:

Client-side image classifier: lightweight models on-device to suggest categories immediately and speed up the flow.
Backend GPT-5: to take the image and context and decide the correct municipal fields and endpoints for submission.
Cloud infrastructure (AWS): to host processing, queueing, and API calls.

This hybrid on-device / cloud approach balances responsiveness (instant classification in the UI) with richer reasoning and mapping capabilities in the cloud.

Design choices that matter

Two choices stood out to me:

Minimal user interaction: the app is optimized for the fewest possible taps. That matters for adoption because many residents will use the app opportunistically while walking.
Decision automation: handling the tedious decisions (which city endpoint, public vs. private, offense level) reduces errors and results in higher-quality reports reaching municipal teams.

Those choices convert sporadic engagement into reliable input for municipal operations.

🗺️ VoiceReach — real-time outreach for neighborhood street teams

Next I spent time with Jason and Bowen, two college freshmen who launched VoiceReach with high school friends to help outreach teams who work directly with people facing homelessness, behavioral health issues, and related crises.

Their presentation was both striking in its empathy and impressive in its technical design. VoiceReach tackles a very specific operational challenge: street outreach is time-sensitive and personal, but reporting on interactions—filling forms or maintaining records—sucks time away from care. The founding team started from the assumption that even saving a few seconds per interaction could meaningfully increase the time responders spend providing services.

What VoiceReach does

Voice-first data capture: Outreach workers record conversations in real time. The app transcribes using Whisper Real Time and then extracts structured fields with GPT-5 Nano so data is ready for search and dashboards.
Centralized profile management: People encountered by outreach teams are saved as searchable profiles with demographics, medical conditions, history of interactions, and GPS location.
Duplicate detection and merging: The system suggests matches when a person appears to exist already in the database—preventing fragmentation and ensuring continuity of care.
Decision support: An assistant provides immediate guidance—e.g., "Check if John has stable access to insulin and food"—to help responders prioritize action.
Dashboards and hotspot detection: Aggregated data shows emerging clusters of need and highlights where services should be focused.

The team built VoiceReach during a weekend hackathon and iterated quickly. They leveraged open AI tools including Codex for development assistance, GPT-5 Nano for lightweight on-device reasoning, Whisper for transcription, and text embedding models for search and matching.

Field workflow example

I walked through a sample flow Jason and Bowen demonstrated: a responder meets someone named John, records a quick voice note describing John’s age, veteran status, medical conditions, and housing struggles. Whisper transcribes in real time; GPT-5 Nano maps extracted entities into structured fields like name, height, weight, age, and medical conditions. When the responder saves the profile, the app checks the database and suggests a potential match to avoid duplicates. The responder merges the records if appropriate, and the unified profile now contains a consolidated interaction history and a priority status derived from risk factors like diabetes.

When the responder needs guidance, the assistant proposes practical next steps, such as connecting John to VA services or ensuring he has access to medication and housing referrals. The result is an increase in time spent on care, better continuity across teams, and data that can be used for coordinated, evidence-driven outreach.

Why this matters

VoiceReach addresses three systemic frictions:

Time cost of documentation: reducing typing and form-filling frees responders to do their work.
Data fragmentation: centralized profiles reduce siloed knowledge among departments and NGOs.
Reactive response patterns: dashboards and hotspot detection let teams act proactively and allocate resources where they're needed most.

In short, VoiceReach is a design for operational leverage: small efficiency gains that compound into better outcomes for vulnerable residents.

🏙️ City Science Lab — simulating a city's future with models and imagery

Peter Hirschberg and Kate Connolly presented a contrasting but complementary use of AI: instead of streamlining operations, they apply AI to urban planning and public engagement. Their City Science Lab, in collaboration with MIT Media Lab’s City Science Network, brings systems modeling, probabilistic forecasting, and image generation together to help citizens and planners visualize how neighborhoods could evolve.

Their work responds to a familiar problem: maps and zoning documents are technical and hard to interpret for most residents. Poor visualization opens the door to misinformation—people create exaggerated images of dense development that breed distrust.

Peter and Kate's approach uses data and generative models to produce realistic, contextual visualizations and performance projections that are anchored in rules and economic scenarios. These tools give residents and policymakers a shared language for discussing trade-offs.

How the lab models development potential

The lab’s process for imagining potential housing on a parcel involves several steps:

Parcel filtering: Identify parcels that are legally and technically developable, exclude protected or otherwise constrained lots (for example, tenant-protected red lots that cannot be rebuilt).
Economic feasibility scoring: Apply rules of thumb, code constraints, and economic models to give each parcel a probability score for redevelopment. This is where reinforced fine-tuning can help encode nuanced judgment about what’s realistic versus theoretically possible.
Design synthesis: Use ImageGen to render plausible massing and architectural styles that fit the neighborhood character—e.g., "marina style" in the Richmond.
Community preference integration: Incorporate both implicit data (behavioral patterns) and explicit inputs from planning interviews to suggest amenities like daycare that are likely to succeed in a given corridor.

The result is a data-driven, image-backed narrative for a block or corridor: not abstract zoning maps, but renderings that help residents imagine realistic futures and understand trade-offs.

Using Sora to tell local stories

To layer human storytelling on top of data, the lab used Sora (a generative video tool) to create a short film featuring a local booster promoting redevelopment of an old theater. The film included a cameo by Sam Altman—explicitly noted to be a generated cameo used with permission—illustrating that citizens can produce polished narratives that help people think about change instead of fear it.

Why this approach matters for civic discourse

Visuals shape imagination. When neighbors see a speculative render that approximates a future block and includes likely amenities and transit connections, they can engage in a discussion grounded in possibility rather than anxiety. The lab’s tools aim to reduce misinformation, improve trust, and create a more productive planning conversation.

🔗 Common themes: what links these projects

Across Solve SF, VoiceReach, and the City Science Lab I noticed recurring themes about how AI is being applied at the city scale. Summarizing them helps explain why San Francisco is actively experimenting with these models.

Speed and accessibility

Each project reduces friction—either for residents submitting reports, responders documenting field interactions, or citizens understanding planning options. Faster workflows scale participation and make systems more responsive.

Augmentation, not replacement

Every presenter positioned AI as a tool to augment human judgment, not replace it. In Solve SF, GPT-5 fills in clerical details but municipal staff are still the ones performing the work. VoiceReach gives outreach workers more time and better data while leaving final care decisions to humans. City Science Lab uses models to inform public debate, not to dictate outcomes.

Human-centered design

All three teams prioritized the user's needs: a quick camera action for a passerby, a voice-first interface for time-pressed responders, or visualizations that speak the language of residents. Human-centered interfaces increase adoption—a small but crucial detail for public-interest tech.

Data integration and systems thinking

These aren't isolated apps; they aim to plug into existing municipal systems. Solve SF integrates with 311 APIs, VoiceReach seeks to unify records across departments and NGOs, and City Science Lab aggregates planning and parcel data. Integration is key to turning isolated signals into meaningful action.

🛠️ Technology deep dive: models, privacy, and implementation

I dug into the technical mixes the teams used to understand the trade-offs and practical constraints of building civic AI.

Models and roles

Different models were used for different roles:

Whisper (and Whisper Real Time): speech-to-text for VoiceReach, enabling seamless transcription of field conversations.
GPT-5 and GPT-5 Nano: reasoning and structured-data extraction. Larger models handled complex classification and municipal mapping; compact Nano models enabled faster, cheaper local inference for field use.
ImageGen and Sora: generative visual and video tools for planning visualizations and storytelling.
Codex (or similar tools): used by early-stage builders as a development assistant to accelerate shipping.

On-device vs cloud trade-offs

Teams combined on-device classification and cloud reasoning to balance speed, privacy, and cost. On-device models protect sensitive content from being sent to the cloud when possible and preserve a responsive user experience. Cloud models provide broader context and complex mapping but require careful design for data handling and latency.

Data governance and privacy

Working with people in crisis or handling images of private property raises significant privacy and ethics concerns. The presenters acknowledged the importance of careful data handling, and I recommend cities and makers consider several governance practices:

Minimize data collection: collect only what is necessary for the task—examples: location, issue type, timestamp, and a supporting photo for 311—rather than unlimited personal data.
Local inference when possible: use on-device models for initial classification to avoid sending raw audio or images to the cloud unless needed.
Explicit consent and transparency: make clear to users what data is collected, why it’s used, and how long it’s retained.
Access controls and auditing: limit who can see sensitive data and log access for accountability.
Data sharing agreements: formalize when data can be shared across departments or with NGOs, and for what purposes.

VoiceReach, for example, must carefully reconcile the need for centralized profiles with privacy protections for vulnerable individuals. That entails strict role-based access and clear retention policies.

⚖️ Ethics, equity, and governance

Applying AI to civic problems amplifies both promise and risk. I spend a lot of time thinking about how these projects can be guided by ethical guardrails that protect residents while improving services.

Bias, fairness, and prioritization

Automated prioritization can be helpful—like flagging offensive graffiti or determining that someone with diabetes needs higher prioritization—but it can also encode biases. City systems must be transparent about how priority scores are computed and provide human oversight for appeals or corrections.

A few practical suggestions:

Open algorithms where possible: publishing non-sensitive parts of the scoring logic helps build trust.
Human-in-the-loop workflows: keep humans making final prioritization decisions, especially in borderline cases.
Regular audits: measure model outputs across neighborhoods and demographic groups to detect disparities.

Consent and dignity

Especially for tools like VoiceReach that work with people experiencing homelessness, consent and dignity are paramount. Recording interactions—even for good reasons—requires protocols for informed consent and must protect identities from unnecessary exposure.

Public accountability

Public-facing civic tools should be auditable. Residents need assurance that automated decisions can be traced and contested and that the city remains accountable for outcomes driven by AI-assisted workflows.

📈 From pilot to citywide adoption: scaling strategy

Getting an app to work in a single neighborhood is different than making it part of city operations. I considered what it will take for these projects to scale sustainably.

Key barriers to scaling

Interoperability: different departments use different systems; integration requires API compatibility and standardized schemas.
Operational ownership: who maintains the service? Open-source projects and small teams need government partnerships for long-term stability.
Funding and procurement: cities have procurement rules that can slow adoption. Pilot projects must be designed to fit procurement pathways or receive special approvals.
Training and change management: staff need training and ongoing support to adopt new tools, especially in high-stress roles like outreach teams.
Public trust: scaling requires residents' trust that their data will be handled responsibly and that AI will augment human judgment rather than replace it.

Steps to accelerate impact

I see a pragmatic path to scaling that combines product design and civic process:

Start with low-risk wins: projects that reduce clerical friction (like Solve SF) create visible, fast improvements and gentle proof points for the city.
Document integration patterns: publishing a clear blueprint for connecting to 311 or other municipal endpoints makes it easier for other developers to adopt similar patterns.
Formalize partnerships: sign memoranda of understanding between developers and departments to clarify responsibilities and data flows.
Open-source core components: shared libraries for common tasks (audio transcription, duplicate matching, field mapping) reduce duplication and create community standards for civic AI.
Build feedback loops: implement user feedback and performance monitoring so systems improve with use and maintainers can address issues quickly.

🔍 Real-world impact: what success looks like

All three projects described measurable signs of success that I think are useful to unpack because they give concrete metrics for other civic AI efforts.

Solve SF

Adoption metric: 100,000+ reports submitted through the app.
Service outcomes: visible city responses—pressure washing, graffiti abatement, trash pickup—with photographic confirmations in many cases.
Behavioral effect: increased reporting frequency by reducing friction to two clicks.

VoiceReach

Operational efficiency: reduction in time spent per interaction through voice-first capture and automated structuring.
Continuity of care: centralized profiles reduce duplicated outreach and improve follow-up consistency.
Data-driven outreach: hotspot maps allow teams to allocate resources proactively.

City Science Lab

Public understanding: renderings and scenario modeling reduce misinformation about redevelopment.
Deliberative value: visualizations combined with amenity suggestions (e.g., daycare) help planners and residents discuss trade-offs grounded in data.
Engagement: generated video narratives let community advocates and planning departments communicate futures in accessible formats.

Success for civic AI is not purely a technology metric—it's measured by people’s lived experience: faster cleanups, better access to services, more constructive public conversations about growth.

🔄 Risks and failure modes I watched for

No technology is risk-free, and these projects surface the kinds of failure modes cities must plan for.

Overreliance on automated labeling

If staff or residents begin to trust automated labels without verification—like offense flags or priority scores—there’s a risk of misplaced resources or unreviewed escalations. Human validation remains essential.

Data drift and model degradation

Models trained on a snapshot of data can underperform over time. The city and developers must set up continuous monitoring and retraining strategies to ensure accuracy doesn't degrade in the field.

Equity gaps

Tools that aggregate data can obscure underreported populations. For example, certain neighborhoods might be less likely to use an app like Solve SF or to be reachable by outreach teams; that leads to blind spots unless proactively addressed.

Security and misuse

Open APIs and images could be misused if not secured. Authentication, rate limiting, and logging are basic protections that civic projects must implement from day one.

📣 My recommendations for cities and builders

Based on what I observed, I offer practical recommendations for other cities or teams looking to replicate these successes.

For city leaders

Design for partnership: create clear, lightweight pathways for startups to integrate with city APIs and operations. These can include developer keys, sandbox environments, and data sharing agreements designed for small teams.
Prioritize small wins: focus on use cases with clear benefits and low risk—cleaning, reporting, data consolidation—before tackling more invasive interventions.
Invest in governance: set up ethical review boards and data governance frameworks early, particularly for initiatives that touch sensitive populations.

For builders and technologists

Start with users: design around the people who will actually use the product—residents, outreach teams, planners—rather than the technology itself.
Design for integration: build with standard APIs and data schemas so your product can plug into municipal workflows with minimal friction.
Invest in explainability: make automated decisions transparent and reversible; document heuristics and provide human review pathways.
Plan for maintenance: small teams require clear plans for long-term hosting, incident response, and handoffs to city IT or non-profits if adoption grows.

👥 Community, capacity, and the role of students

One of the most inspiring aspects of the session for me was the participation of students—four high school friends who turned a weekend hackathon into VoiceReach. Their example highlights the untapped civic potential of students and early-career technologists.

Students bring imagination, urgency, and the ability to iterate quickly. When partnered with city teams that can provide guidance and access to real problems, student-built solutions can accelerate innovation while also cultivating the next generation of civic technologists.

🔭 Looking ahead: what the next 12–24 months could bring

From my vantage, the pipeline for civic AI in San Francisco (and in comparable cities) is promising. Here are several realistic near-term developments I expect to see:

More camera-first citizen services: apps like Solve SF will expand beyond cleanliness to other categories such as infrastructure damage or accessibility issues.
Voice-powered field tools: more outreach and public safety workflows will adopt on-the-fly transcription and entity extraction to reduce administrative burden.
Integrated data platforms: municipalities will invest in central platforms that aggregate signals from apps, 311, and field teams to produce coordinated action plans.
Policy and governance frameworks: as adoption grows, cities will build legal and ethical frameworks to govern AI-driven civic tools—covering procurement, privacy, and public accountability.
Community-driven visualization: planning departments and labs like the City Science Lab will use mixed reality, image generation, and story-driven video to democratize urban planning debates.

📌 What I learned—and what I’ll be watching next

Being present at the session reinforced several lessons I find important:

Practical tools beat grand promises: the most compelling demos were not the fanciest algorithms but those that solved concrete pain points in straightforward ways.
Trust is both earned and designed: transparency, easy user controls, and visible results (like before/after photos) are powerful trust-building devices.
Human systems still matter: staffing, interdepartmental coordination, and procurement processes can make or break adoption far more often than technical limitations.
Students and small teams matter: nimble teams can prototype quickly and unlock surprising value when given access to real problems and data.

I'll be watching how these projects evolve: whether Solve SF can institutionalize integration patterns with other cities, whether VoiceReach develops robust consent and governance models for sensitive data, and whether the City Science Lab influences real planning decisions with its visualizations.

✅ Conclusion: San Francisco as a living lab and model

What I took away from the presentations is that San Francisco is positioning itself as a testbed where civic needs and cutting-edge AI meet. The projects I covered—Solve SF, VoiceReach, and the City Science Lab—represent distinct but complementary vectors of impact:

Solve SF improves the everyday interface between residents and municipal services by reducing friction and increasing civic participation.
VoiceReach strengthens frontline service delivery by folding voice and AI into outreach workflows, increasing the time responders spend helping people and improving continuity across teams.
City Science Lab helps shape civic discourse by producing realistic, data-grounded visualizations that reduce misinformation and enable better planning conversations.

All three projects demonstrate a pragmatic model for civic AI: start small, design for people, integrate with city systems, and build governance into the product from the start. I left energized by the creativity in the room and convinced that, if cities and developers continue to partner in this way, AI can help make urban life safer, cleaner, and more equitable.

If you're a builder, a city official, or simply a resident curious about the future of urban technology, the model shown here is worth studying: use AI to remove friction, amplify human judgment, and make government feel more responsive. That is how innovation becomes service, and how service becomes trust.