A new playbook for legal AI
We may assume that buying artificial intelligence tools is like buying traditional software. It's not
It can feel like every third article published online today is about artificial intelligence. Between the breathless hype and the constant warnings of catastrophe, it is easy to want to tune out the whole conversation.
Some observers even argue that large law firms are helping to fund the very AI tools that may one day automate parts of their own work.
Still, if we move past the noise, there is real value in having a structured and practical discussion about what these tools actually do.
Lack of consistent standards
Legal teams have well-established playbooks for hiring human talent. We know exactly what questions to ask, what credentials to look for, and how to verify references.
Yet when it comes to hiring AI agents, that operational discipline often vanishes. Instead, we see legal teams trusting a vendor with a good pitch, attempting to build an entire proprietary model from scratch, or testing every new tool that hits the market.
In practice, legal AI procurement is still in its early stages and lacks consistent standards. Right now, 88 per cent of legal teams are not committed to their current AI vendor, and 63 per cent evaluate these tools without any IT or security involvement. With more than 800 AI tools targeting the legal sector, this approach leads to months of duplicated effort, inconsistent evaluation, and decisions driven by polished marketing rather than evidence.
To make the invisible procurement process visible, Legal Benchmarks launched the Legal AI Evaluation Framework. This free, open-access playbook provides legal teams with a structured, practical approach to evaluating third-party AI vendors. I was fortunate to contribute to this project as a member of the steering committee board, where my role involved helping refine the criteria legal teams use when speaking with vendors and negotiating products.
A profession in motion
There is a common refrain that the legal profession is ancient, conservative, and slow to adopt technology. But as lawyers like to say, it depends.
Across the profession, legal practitioners are already sharing practical experiences with AI. The early panic about hallucinations is fading as the models improve and errors become less common. I suspect this wave of technology will unfold differently from past ones. Institutional change may move slowly because of regulation and many stakeholders, but the reality on the ground is moving quickly. Individual lawyers are already using these tools, and many firms have introduced policies to protect client data and guide responsible use.
The conversation is also active in legal education and scholarship. Law schools and scholars across Canada are discussing AI in classrooms, blogs, and articles. On my podcast, I spoke with two law students, who are now lawyers, who helped develop references for transparent signalling and classification of AI-assisted legal work as part of a collection of articles written with the help of AI.
The invisible challenge
Legal careers are shaped less by credentials and more by communication, and no one teaches that early enough. This truth extends perfectly to technology. We spend hours learning how to craft the perfect prompt for an AI tool, realizing that without deep context and clear instructions, the output is useless.
Yet I often wonder whether we are applying the same rigour to our human counterparts. Are we giving our colleagues the right “prompts" — the necessary background, the clear constraints, and the actionable information they need to succeed? Ironically, learning to communicate effectively with a machine might just be the exact training we need to communicate better with each other, at least in the workplace, where giving clear instructions and providing context is paramount.
Integrating AI into a firm is fundamentally a communication challenge on a larger scale. What is the unspoken rule here? It is that adopting AI requires translating legal needs into technical requirements, bridging the gap between legal practice, IT, and security. We may assume that buying AI is like buying traditional software. It is not.
The power of shared expertise
Last year, I wrote about completing 75 conversations with lawyers around the world to celebrate the 75th anniversary of the University of Toronto Faculty of Law. What stood out to me then was the overwhelming generosity of spirit in every meeting. Not every connection needs an immediate result to have value; showing up consistently and sharing insights is what builds a strong community over time.
I saw that exact same generosity at work in building the Legal Benchmarks framework. It is a massive collaborative effort developed by legal, privacy, security, and technology professionals working across more than 100 organizations. It shows that when our industry faces a complex change, the most useful resource is not a proprietary algorithm. It is the people around us.
The three stages of evaluation
The framework consolidates the procurement lifecycle into three practical stages. Like hiring a knowledge worker, each stage demands a different level of scrutiny.
Stage 1: Pre-demo vetting. Think of this as the resume screening phase. It is a first-pass screening based solely on publicly available materials to decide whether a vendor even warrants a live demo. The framework emphasizes that security and data privacy are threshold criteria; a failure on either should disqualify the tool before a demo is ever booked.
Stage 2: Demo validation. This is the interview. The goal is to test and validate the vendor's claims in a live setting, using your own use cases and documents. You must verify if the product performs as claimed in practice, focusing only on what can be credibly demonstrated in the session.
Stage 3: Pilot testing. This is the working trial. It provides a structured method to assess the tool's real-world performance using real workflows, users, and documents in a controlled environment. For a focused pilot, the framework recommends a minimum of four weeks to ensure users encounter edge cases in their day-to-day work, rather than just polished test scenarios.
The eight core criteria
Across these three stages, the framework evaluates tools on eight core criteria.
Strategic fit and functionality: Does the solution align with your priority use cases and fit naturally into the tools lawyers already use, like Word or a document management system?
Robustness: How reliable are the outputs? The framework forces legal teams to verify factual accuracy and check whether the vendor provides a credible account of how the solution handles hallucinations. Furthermore, outputs must include citations, sources, or document anchors.
Security and data privacy: This goes far beyond standard software checks. You must confirm an explicit "no training on customer data" position. You also need to fully understand the architecture and data flow, including tenant isolation.
Vendor risk, adoption support, and cost: Who is building the tool, and what happens if you leave? You need to assess data export and exit strategies. You must also map the true total cost of ownership, including the internal operational capacity required to deploy and maintain the tool.
Moving the profession forward
What mistake do people make because no one explained this? They evaluate AI in isolation, driven by fear of missing out rather than evidence.
AI is changing the way legal teams operate in-house and in private practice. Beyond the technology itself, success depends on operational discipline and clear thinking about risk and value. This framework is a practical guide for lawyers to adopt AI responsibly, proving that collaboration and shared expertise are what truly move the profession forward.