Now Assist Guardian (Safety and Security)
In Chapter 1, I briefly mentioned that Guardian monitors AI outputs and keeps things safe. Now let's get into the detail. How does it actually work? What does it protect against? And why should prompt engineers care?
Guardian exists because generative AI is inherently unpredictable. The same prompt with the same input can produce different outputs. That's the nature of language models. In a creative writing context, that variability is a feature. In an enterprise context, it's a risk.
Guardian provides the safety layer that makes AI viable for enterprise use.
What Guardian Actually Does
Guardian is built on a ServiceNow Small Language Model specifically trained to evaluate content. It monitors both inputs (what users send to the AI) and outputs (what the AI generates back). Think of it as a security checkpoint that every AI interaction passes through.
Three core areas of focus.
Offensive or harmful content covers language that's toxic, defamatory, or fraudulent. Guardian scans both user requests and AI responses for this type of content. If someone tries to use the AI to generate something inappropriate, Guardian catches it. If the AI generates something problematic, Guardian catches that too.
Prompt injection attempts are sophisticated attacks where someone tries to override the AI's instructions. These might involve role playing scenarios, instructions to ignore previous commands, or other manipulation techniques. Guardian has been trained to recognise these patterns and block them before they reach the language model.
Filtered subjects allow organisations to block content related to predefined sensitive topics. Maybe your organisation has topics that should never appear in AI generated content. Guardian can enforce those restrictions consistently.
Input Screening vs Output Screening
Guardian operates at two stages, and both matter.
Input screening happens before the request reaches the language model. If someone submits a prompt injection attempt or a request for harmful content, Guardian can block it immediately. The LLM never sees the problematic request.
Output screening happens after the language model generates its response. Even with good input screening, language models can occasionally produce unexpected outputs. Output screening catches anything problematic before it reaches the user.
This dual layer approach means protection at both ends of the AI interaction.
Configuration Options
Guardian isn't just an on/off switch. You configure it to match your organisation's requirements.
Blocking vs logging gives you flexibility in how Guardian responds. You can configure it to actively block requests that trigger concerns, displaying a fallback message instead of processing the request. Or you can configure it to log events for review while still allowing processing. Most organisations start with logging to understand patterns, then enable blocking for specific threat types.
Sensitivity levels let you tune how aggressively Guardian intervenes. Highly regulated industries might want stricter thresholds. Others might prefer lighter touch monitoring with human review of flagged content.
Custom filtered subjects allow you to define organisation specific topics that should trigger Guardian intervention. These extend beyond the default categories to cover your specific requirements.
Data Privacy Masking
Alongside Guardian, ServiceNow provides Data Privacy for Now Assist. This works differently but serves a related purpose.
Data Privacy masking transforms sensitive information in prompts and responses before data leaves your instance. Even during processing, raw PII never reaches the external model if you're using third party providers.
Several masking techniques are available depending on your needs.
Synthetic replacement swaps sensitive data with realistic but fake placeholders. Jane.doe@example.com becomes something generic but coherent. The prompt still makes grammatical sense, but actual PII is removed.
Static value replacement uses fixed placeholders like "999-99-9999" for things like national ID numbers. The structure remains, but the content is completely anonymised.
Partial replacement obscures portions while retaining some digits. Useful when agents need partial visibility, like confirming the last four digits of an account number.
The Defence in Depth Model
I covered this in the Framework Components page, but it's worth reinforcing here. Guardian is one layer in a multilayered security model.
Role Based Access Control determines what data users can access. RAG constraints limit what knowledge sources the AI can reference. Data Privacy masking transforms sensitive data before processing. Guardian monitors inputs and outputs for harmful content.
Each layer catches what others might miss. A prompt injection attempt might bypass RBAC (since it's about manipulation, not access). Data masking might not catch a request for harmful content generation. Guardian provides the content focused protection that complements the other layers.
The Prompt Engineering Connection
Why should prompt engineers care about Guardian? Several reasons.
Well structured prompts are harder to inject. If your prompt template is vague or permissive, it's easier for malicious users to override instructions. Rigid, structured prompts with clear roles, objectives, and constraints provide the AI with strong initial instructions that resist manipulation.
Understanding Guardian helps you troubleshoot. If a skill suddenly stops working for certain inputs, Guardian might be intercepting them. Knowing how to check Guardian logs helps you understand what's happening.
Output restrictions might affect your prompts. If you're designing prompts that touch sensitive topics (even legitimately), Guardian might flag outputs. You need to understand what triggers intervention so you can design appropriately.
Testing should include Guardian scenarios. When you test custom skills, test edge cases that might trigger Guardian. Make sure your fallback handling works properly.
Practical Considerations
A few things I've learned working with Guardian in production environments.
Start with logging before blocking. Understand what Guardian is catching before you start blocking requests. You might discover legitimate use cases being flagged.
Review logs regularly. Guardian logs provide insight into how people are using (and potentially misusing) your AI capabilities. This intelligence is valuable for ongoing governance.
Communicate with users. If Guardian blocks a request, users should understand why. Generic error messages frustrate people. Clear explanations build trust.
Work with your security team. Guardian configuration should align with broader security policies. This isn't just an IT decision.
Where Guardian Fits in Your Implementation
Guardian is part of the governance framework I covered in Chapter 2. When you're planning your Now Assist implementation, Guardian configuration should be on your checklist.
For most organisations, the out of the box Guardian configuration provides reasonable protection. But you should review it, understand what it's doing, and adjust for your specific requirements.
As you move into prompt engineering in Part II, keep Guardian in mind. Good prompts work with the security model, not against it.
That wraps up Chapter 3. You now understand the key architectural components: the layered framework, the Generative AI Controller, the Skill Kit, platform integration points, and Guardian. This foundation sets you up for everything that follows.
In Part II, we'll dive into prompt engineering fundamentals. The theory and techniques that make the difference between prompts that work and prompts that excel.
Last updated