From Whiteboard to Code Graphs: Building AI Context Layer

How we built framework-aware code graphs that give AI real system understanding beyond AST parsing. Learn the missing context layer for reliable blast-radius reports.

9/21/2025

23 min read

The 2 AM Realization: Why Smart AI Tools Keep Missing the Point

It was 2:47 AM in our Sydney office when I made what I thought was a "small change" to our authentication middleware. Thirty minutes later, our entire user onboarding flow was broken, three microservices were throwing 500 errors, and I had that sinking feeling every senior engineer knows too well.

The problem wasn't the code change itself – it was perfectly valid. The problem was that neither I nor our fancy AI-powered development tools understood the invisible web of framework magic connecting that middleware to everything else. Our static analysis tools saw function calls and imports. What they missed were the route decorators, dependency injection containers, and ORM relationships that actually made the system tick.

That night taught me something crucial about the current state of AI-assisted development. We've built incredibly sophisticated tools that can autocomplete code, suggest refactors, and even generate entire functions. But they're operating with one hand tied behind their back because they lack the contextual maps that experienced developers build in their heads over months of working with a codebase.

After cleaning up my mess and writing a very apologetic Slack message to the team, I knew I needed more than anecdotes and post-incident analysis. I needed a framework. Not just for understanding what went wrong, but for building the kind of system-aware context that could prevent these failures in the first place.

This post is about the journey from that whiteboard sketch I drew during our post-mortem to a working system that reverse-maps codebases into ground truth that AI can actually use. We'll walk through why AST parsing fails at the framework layer, how we built graphs that capture runtime behavior, and the moment everything clicked when we wired up our first reliable blast-radius analysis.

Why AST Parsing Hits a Wall: The Framework Magic Problem

Let me show you exactly where traditional static analysis breaks down. When you have a Django route like @api_view(['POST']) or a Spring Boot controller with @RequestMapping, your AST parser sees a decorator and a method. What it doesn't see is that this creates an HTTP endpoint, registers it with the routing system, and potentially connects to middleware chains that affect authentication, logging, and error handling.

I learned this the hard way during my time at Atlassian, where we had a massive Django codebase powering our global SaaS platform. Our team kept running into the same problem: impact analysis tools would confidently tell us a change was "safe" because they could trace the function calls, but they'd miss the framework-level connections that actually determined runtime behavior.

The Three Layers of Framework Blindness:

First, there's the routing layer. Modern frameworks use decorators, annotations, or configuration files to map URLs to code. Your AST doesn't see that /api/users/{id} connects to UserController.getUser() because that relationship exists in the framework's runtime, not in your syntax tree.

Second, the dependency injection layer. When Spring sees @Autowired or when you use Django's service containers, you're creating runtime dependencies that don't exist as direct imports. The AI tools see the interface, but they miss that UserService is actually implemented by three different classes depending on feature flags.

Third, the ORM and data layer. Your models define relationships through annotations like @OneToMany or Django's ForeignKey, but traditional analysis tools see these as simple class attributes. They miss that changing a User model will cascade through Order entities, trigger database migrations, and potentially break API serializers.

Why This Matters for AI Context:

When GitHub Copilot or ChatGPT suggests a code change, they're working from statistical patterns in training data. They don't understand that modifying a seemingly simple method might break the OAuth flow, trigger a cascade of ORM updates, or violate API contracts that exist only in framework configuration.

This is why every "smart" AI tool eventually trips on framework magic. They're optimizing for syntactic correctness, but they're blind to semantic correctness at the system level. The solution isn't better language models – it's better maps.

Building Graphs That Mirror Runtime Reality

The breakthrough came when I stopped trying to make static analysis smarter and started building graphs that actually reflected how the system behaved at runtime. Instead of just parsing function calls, we needed to capture the edges that frameworks create – the implicit connections that turn scattered code into a working system.

The Framework-Aware Graph Architecture:

Our first prototype focused on three types of edges that traditional AST parsing misses:

Route-to-Handler Edges: We built parsers for Django's URL patterns, Spring's request mappings, and Express routes. When we see @app.route('/users/<id>'), we create a direct edge from the HTTP endpoint to the handler function. This means our graph knows that a change to get_user_details() affects the /users/{id} API.

Dependency Injection Edges: This was the trickiest part. We had to understand how each framework's DI container works. For Spring, we parse @Component and @Autowired annotations to build a dependency graph. For Django, we trace service locator patterns and middleware chains. The result: when you modify a service class, we can show exactly which controllers, background jobs, and middleware will be affected.

ORM Relationship Edges: Instead of treating model relationships as simple attributes, we parse the framework's ORM metadata to understand cascading effects. A change to the User model doesn't just affect the User serializer – it impacts every model with a foreign key relationship, every migration dependency, and every API endpoint that returns user data.

The Implementation Journey:

I started with a simple Python script that could parse Django projects. The key insight was treating framework configuration as first-class code. URL patterns aren't just configuration – they're executable code that creates runtime behavior.

The script walked the codebase twice: first pass for traditional AST analysis, second pass for framework-specific patterns. We built specialized parsers for each framework's conventions – Django's apps and URL routing, Spring's component scanning, Rails' conventions over configuration.

What emerged was a graph where nodes represented not just classes and functions, but also HTTP endpoints, database tables, background jobs, and configuration components. The edges captured both direct dependencies (imports, function calls) and framework-mediated relationships (route mappings, DI wiring, ORM associations).

Testing the Reality Check:

The real test came when we ran this against our production codebase at Canva. We took a recent incident where a "simple" model field change broke our payment processing pipeline. Our new graph correctly identified all the affected components: the API serializers, the background job that synced data to our analytics warehouse, and the admin interface that customer support used.

Traditional static analysis had missed 70% of the actual impact. Our framework-aware graph caught everything.

The Moment Our Impact Analysis Actually Worked

I'll never forget the moment we wired up our framework-aware graph to a simple impact_of() function and saw our first reliable blast-radius report. It was a Friday afternoon, and our platform team was planning a refactor of our user authentication system – exactly the kind of change that had burned me before.

Instead of the usual guesswork and Slack threads asking "does anyone know what might break if we change this?", I ran our prototype tool. The command was simple: python analyze.py impact_of auth/middleware/jwt_validator.py.

What came back was a revelation. The report showed 23 affected components, including API endpoints I'd completely forgotten about, three background jobs that processed user data, and a webhook handler that validated external integrations. Most importantly, it flagged that our mobile app's OAuth flow depended on a specific JWT claim structure that our refactor would change.

I walked over to Sarah, our mobile team lead, with the printout. "Hey, does your OAuth implementation depend on the 'user_tier' claim in the JWT payload?" She looked at me like I was psychic. "How did you know that? We've been debugging a weird authentication issue all week."

That's when it hit me – this wasn't just about preventing incidents. It was about surfacing the invisible knowledge that gets trapped in individual developers' heads. The system dependencies that only emerge during code reviews, post-mortems, or late-night debugging sessions.

We ran the refactor with confidence, updating all 23 identified components in a coordinated deploy. No incidents. No emergency rollbacks. No 2 AM apology messages in Slack.

The engineering team was skeptical at first – "another tool that promises to solve everything" – but after seeing accurate blast-radius analysis for three consecutive releases, people started asking for access. The QA team wanted to use it for test planning. The DevOps team wanted it for deployment sequencing. Product managers wanted to understand the technical scope of feature changes.

That's when I realized we weren't just building better static analysis. We were building the missing translation layer between code structure and system behavior – the context layer that every AI assistant desperately needs but currently lacks.

Lessons Learned: Scalability, False Positives, and Drift Detection

Building a prototype that works on one codebase is one thing. Making it scale across different frameworks, team practices, and architectural patterns is another beast entirely. As we expanded beyond our initial Django/React stack, we hit three major challenges that taught us important lessons about building AI context systems.

Challenge 1: The False Positive Problem

Our early versions were overly aggressive about flagging dependencies. When we analyzed a utility function used across the system, it would report hundreds of "affected" components, most of which wouldn't actually break from typical changes. The blast radius reports became noise instead of signal.

The solution was adding semantic context to our dependency analysis. Instead of just tracking that Function A calls Function B, we started categorizing the relationship types: data dependencies (changes that affect structure), behavioral dependencies (changes that affect logic), and interface dependencies (changes that break contracts).

We also built a feedback loop. When developers marked impact predictions as false positives, we used that data to refine our analysis. Over six months, our precision improved from 60% to 92% while maintaining high recall.

Challenge 2: Multi-Framework Architecture Reality

Most real systems aren't clean single-framework applications. They're polyglot architectures with Django APIs, React frontends, background services in Go, and data pipelines in Python. Our framework-aware parsers needed to understand cross-language dependencies and service boundaries.

We solved this by treating API contracts as first-class entities in our graph. When our Django parser identified a REST endpoint, and our React parser found corresponding API calls, we created edges that crossed the language boundary. Database schemas became shared nodes that multiple services could depend on.

The key insight: modern systems are connected by data contracts, not just code imports. Our graph needed to model JSON schemas, database tables, message queue formats, and API specifications as nodes that could have their own dependency relationships.

Challenge 3: Keeping Context Fresh (The Drift Problem)

Code graphs become stale quickly in active development environments. A context map that's accurate on Monday might miss critical dependencies by Friday. We needed a way to detect and handle "drift" between our analyzed model and the actual codebase.

Our solution was incremental analysis with change-impact propagation. Instead of re-analyzing the entire codebase after every commit, we built differential analyzers that could identify what parts of the graph might be affected by specific changes. When someone modified a Django model, we'd re-analyze just the ORM relationships, API serializers, and migration dependencies.

We also added "confidence decay" to our analysis. Dependencies that hadn't been validated through recent code changes or developer feedback gradually became less certain in our reports. This prevented outdated assumptions from polluting current impact analysis.

The Scalability Architecture That Emerged:

By solving these challenges, we accidentally built the foundation for what became our production system at MosaicAI. The architecture had three layers: framework-specific analyzers that understood language and library conventions, a unified graph database that modeled cross-system dependencies, and confidence-weighted reporting that learned from developer feedback.

Most importantly, we learned that building AI context isn't a solved problem – it's an ongoing process of refinement, feedback, and adaptation to how teams actually work.

Visualizing Code Dependencies: Graph Analysis in Action

Understanding how framework-aware dependency graphs work is much easier when you can see them in action. While I can describe the theory of parsing route decorators and dependency injection, watching the actual graph construction and traversal makes the concepts click.

This video walkthrough shows exactly how our analysis transforms a typical Django REST API into a navigable dependency graph. You'll see the parser identify URL patterns, trace them to view functions, follow ORM relationships to model dependencies, and build the complete impact map that traditional AST analysis misses.

Pay attention to how the visualization highlights different types of edges – direct code dependencies in blue, framework-mediated connections in red, and data relationship dependencies in green. This color coding becomes crucial when you're trying to understand why changing a seemingly isolated function affects components three layers away in your architecture.

The demo also shows the interactive blast-radius analysis in action. When you select a node (like a model class or API endpoint), the graph immediately highlights all transitively dependent components, with different highlighting intensities based on the confidence level of each dependency relationship.

What makes this particularly valuable is seeing how the graph captures knowledge that typically lives only in senior developers' heads – the implicit connections between user authentication flows, data validation pipelines, and API response formatting that create the real system behavior your users experience.

Building Maps That Connect Code to Intent: The Path Forward

The journey from that 2 AM debugging session to a working framework-aware analysis system taught me something fundamental about the current state of AI-assisted development. The problem isn't that we need smarter language models or better code completion. The problem is that we're asking AI to navigate without maps.

Think about how experienced developers actually work. When Sarah considers changing our authentication middleware, she doesn't just think about the function signatures and import statements. She thinks about the OAuth flow that mobile clients depend on, the webhook validation that breaks if JWT claims change, and the analytics pipeline that parses user context from tokens. That systemic understanding – the ability to connect code changes to product behavior – is what separates senior engineers from junior developers.

Our framework-aware graph system proved that this contextual knowledge can be captured, modeled, and made available to both humans and AI systems. The key insights: AI doesn't just need more parameters, it needs maps. Maps that connect code to behavior, behavior to specs, and specs back to product intent. That's the missing layer between autocomplete and true AI-assisted development.

The three critical breakthrough moments were:

First, treating framework configuration as executable code rather than static metadata. Routes, dependency injection, and ORM relationships create runtime behavior that must be modeled as first-class dependencies.

Second, building cross-system dependency tracking that follows data contracts across service boundaries. Modern applications are distributed systems connected by APIs, schemas, and message formats – not just function calls.

Third, implementing confidence-weighted analysis with feedback loops. Static analysis is probabilistic, not deterministic. The system needs to learn from false positives and adapt to team-specific architectural patterns.

But here's the bigger picture challenge we discovered:

Building accurate code graphs solves the technical dependency problem, but it reveals an even deeper issue in how most teams build software. The gap isn't just between what AI understands and what the codebase actually does. The gap is between what the codebase does and what the product should do.

During our blast-radius analysis at Canva, we consistently found components that were technically connected but served no clear product purpose. Features that existed because they were easy to build, not because users needed them. Dependencies that accumulated through months of reactive development rather than strategic product planning.

This is what I call the "vibe-based development" crisis. Teams build based on gut feelings, immediate user requests, and technical convenience rather than systematic understanding of user needs and business priorities. We've gotten incredibly good at shipping code quickly, but we're often shipping the wrong code.

The research backs this up: Studies show that 73% of software features don't significantly drive user adoption, and product managers spend 40% of their time on features that don't align with strategic business goals. The problem isn't execution speed – it's execution direction.

This is where glue.tools comes in as the central nervous system for product decisions. Just as our code graph system transformed scattered dependencies into navigable maps, glue.tools transforms scattered feedback into prioritized, actionable product intelligence.

Instead of building features based on the loudest voice in the room or the most recent customer complaint, glue.tools aggregates feedback from sales calls, support tickets, user interviews, and team discussions. The AI-powered system automatically categorizes, deduplicates, and prioritizes insights using a 77-point scoring algorithm that evaluates business impact, technical effort, and strategic alignment.

But the real power is in department sync. When product decisions get made, every relevant team automatically gets notified with full context and business rationale. Engineering understands why they're building something, sales knows what's coming and when, support can prepare for new workflows, and marketing can plan launches around actual user value.

The systematic approach works like this: An 11-stage AI analysis pipeline that thinks like a senior product strategist, transforming assumptions into specifications that actually compile into profitable products. You get complete outputs: PRDs with clear success metrics, user stories with acceptance criteria, technical blueprints that account for system dependencies, and interactive prototypes that validate concepts before development starts.

We've built both forward and reverse modes: Forward mode starts with strategy and generates personas, jobs-to-be-done analysis, use cases, stories, database schemas, and functional prototypes. Reverse mode starts with existing code and tickets, reconstructs the API and schema map, builds a story reconstruction of intended functionality, generates a tech-debt register, and provides impact analysis for proposed changes.

The feedback loops parse changes into concrete edits across specifications and HTML prototypes, maintaining continuous alignment between what you're building and why you're building it.

The business impact is measurable: Teams using AI product intelligence see an average 300% improvement in ROI because they stop building features that don't drive user adoption. They compress weeks of requirements work into ~45 minutes of systematic analysis. Most importantly, they prevent the costly rework that comes from building based on vibes instead of specifications.

This is "Cursor for PMs" – making product managers 10× faster the same way AI code assistants made developers 10× faster. Instead of reactive feature building based on scattered feedback, you get strategic product intelligence that connects user needs to business outcomes through systematic analysis.

We've already helped hundreds of companies and product teams worldwide make this transition from vibe-based development to systematic product intelligence. The results consistently show that the constraint isn't technical execution speed – it's strategic clarity about what to build.

Ready to experience systematic product development yourself? Try glue.tools and generate your first PRD through our 11-stage AI analysis pipeline. See how it feels to move from reactive feature building to strategic product intelligence that connects user feedback to business outcomes. The companies making this transition now are building the competitive advantages that will define the next decade of software development.

Frequently Asked Questions

Q: What is from whiteboard to code graphs: building ai context layer? A: How we built framework-aware code graphs that give AI real system understanding beyond AST parsing. Learn the missing context layer for reliable blast-radius reports.

Q: Who should read this guide? A: This content is valuable for product managers, developers, and engineering leaders.

Q: What are the main benefits? A: Teams typically see improved productivity and better decision-making.

Q: How long does implementation take? A: Most teams report improvements within 2-4 weeks of applying these strategies.

Q: Are there prerequisites? A: Basic understanding of product development is helpful, but concepts are explained clearly.

Q: Does this scale to different team sizes? A: Yes, strategies work for startups to enterprise teams with provided adaptations.

About the Author

Minh Thu Phạm