Understanding the Specs
Deep Research using ChatGPT o1-pro, by Rahul Parundekar. Published on 12th April, 2025.
As enterprises embrace AI agents to automate complex workflows, interoperability between these agents has become a critical concern. Google’s Agent2Agent (A2A) protocol – now open-sourced on GitHub – addresses this by defining a standard architecture for agent collaboration across services and vendors. This post provides an in-depth look at A2A’s specifications and an enterprise implementation strategy, organized from high-level concerns (security, deployment, integration) down to core components and technical details. The goal is to give senior engineers a clear roadmap for adopting A2A in an enterprise environment, covering how it secures agent interactions, scales in multi-service ecosystems, integrates with existing infrastructure, and manages the lifecycle of agent tasks.
Security Guarantees and Authentication/Authorization Mechanisms
Security is a foremost design principle of A2A. The protocol is “secure by default,” built to support enterprise-grade authentication and authorization with parity to OpenAPI’s well-known auth schemes (Announcing the Agent2Agent Protocol (A2A)). In practice, this means A2A doesn’t invent new security mechanisms; instead it leverages proven standards:
-
Transport Security: All A2A communications occur over HTTPS. Using TLS ensures encryption in transit and server identity verification, preventing eavesdropping or man-in-the-middle attacks. Enterprises should enforce TLS (and can even require mTLS via a service mesh) for any agent-to-agent traffic.
-
Authentication Schemes: Each A2A service (agent) can require clients to authenticate using common schemes such as API keys, HTTP Basic, or OAuth2 Bearer tokens, mirroring OpenAPI auth options. The agent’s Agent Card advertises its supported auth methods so clients know how to connect. For example, an Agent Card might specify an
authentication
object listing"schemes": ["bearer", "apiKey"]
(What is The Agent2Agent Protocol (A2A) and Why You Must Learn It Now). The client must provide valid credentials according to one of these schemes (e.g. include a JWT bearer token in theAuthorization
header or an API key in a query param) when making requests (What is The Agent2Agent Protocol (A2A) and Why You Must Learn It Now). A2A itself doesn’t dictate how to obtain these credentials – it relies on integration with external identity providers (IdPs) or key management systems typical of the enterprise environment. -
Authorization and Access Control: Once authenticated, the remote agent can perform authorization checks as needed. For instance, a bearer token might be a JWT from the company’s IdP (such as Azure AD or Okta) containing roles or scopes; the agent service can verify the token signature and ensure the caller is allowed to invoke the requested skill. Because A2A is an open protocol, enterprises have flexibility to plug in their own access control logic (e.g., checking that the calling service’s identity is on an allowlist for this agent). The protocol’s open-ended design means companies can enforce policies consistent with their security and compliance requirements. Delegated user authorization (where an agent acts on behalf of a specific end-user) is not natively handled by A2A yet – currently it is achieved by passing user-specific tokens via OAuth2, and the A2A team has noted this as an area for future enhancement (Delegated User Authorization for Agent2Agent Servers · Issue #19 · google/A2A · GitHub) (Delegated User Authorization for Agent2Agent Servers · Issue #19 · google/A2A · GitHub).
-
Agent Identity & Trust: In cross-organization scenarios, the “client agent” must trust the “remote agent” it is calling. A2A uses the Agent Card discovery (described later) to facilitate trust establishment: the card is fetched from a well-known HTTPS endpoint on the agent’s host, so the calling agent can verify it is dealing with the correct service (by domain) and see what auth is required. The Agent Card may in the future include the agent’s public keys or expected credentials format (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). Within an enterprise, additional trust can be enforced via network controls (IP allowlists, mutual TLS between known services, etc.). All these measures ensure that agents “work securely together” as intended (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.), with strong guarantees that only authenticated, authorized parties can trigger actions or access data through A2A.
Deployment Models and Scalability Considerations for Multi-Service Environments
Deploying A2A in an enterprise involves running multiple agent services that communicate over the protocol. The architecture is intentionally decentralized and microservice-friendly, allowing each agent to be deployed independently on whatever infrastructure fits (Kubernetes, VMs, on-prem servers, etc.). Below are key considerations for deployment and scaling:
-
A2A Agent Services: In a typical setup, each autonomous agent is implemented as an A2A Server, essentially a web service exposing the A2A API endpoints (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). For example, you might have separate services for a “HR Assistant Agent”, “Finance Approval Agent”, “IT Support Agent”, etc., each with its own code and data but all speaking the A2A protocol. These services can be containerized and deployed behind load balancers or in an orchestration platform (like multiple pods in Kubernetes). The agent’s base URL (serving the Agent Card and receiving requests) is the main thing other agents need to know to interact with it.
-
Horizontal Scaling: A2A is stateless at the protocol level – requests are discrete JSON messages – which helps with scaling out. Agent services can run multiple instances for load balancing and high availability. Since tasks have unique IDs, a client can send a task to any instance and, ideally, get the same outcome. However, if the agent maintains conversational context or long-running state for a given task, you need to ensure subsequent messages for that task are handled by the same backend instance or shared state. Enterprises can solve this via sticky sessions (routing all messages with a given Task ID to the same instance) or by using a distributed store (where each instance can retrieve the task’s state/history from a database or cache). The A2A spec allows including conversation history in each request as well (What is The Agent2Agent Protocol (A2A) and Why You Must Learn It Now), meaning an implementation could also pass context explicitly to avoid server-side session reliance. These design choices allow A2A services to scale horizontally without losing track of multi-turn interactions. For simple one-shot tasks, scaling is trivial – any instance can handle any new task – and the stateless, JSON/HTTP nature of A2A makes it easy to distribute load across a cluster.
-
Orchestration & Multi-Agent Workflows: In a multi-service environment, you might deploy a hierarchy or network of agents. One common pattern is an Orchestrator Agent (or “agent-of-agents”) that fronts user requests and delegates sub-tasks to specialized remote agents. For example, a user-facing orchestrator might receive a high-level request (“onboard a new employee”) and then use A2A calls to a HR agent, an IT agent, and a Facilities agent in sequence or parallel. The orchestrator itself runs as an A2A server (so it can receive tasks, possibly from a UI or another system) and also acts as an A2A client when calling the others (A2A/samples/python/hosts/README.md at main · google/A2A · GitHub). Such an orchestrator can be scaled and deployed like any service, and the remote agents it calls can scale independently. Enterprise deployment models might include many peer agents calling each other ad-hoc, or a few orchestration hubs mediating most calls – A2A is flexible to support both. Google’s internal experience with “large-scale, multi-agent systems” influenced A2A’s design (Announcing the Agent2Agent Protocol (A2A)), so it explicitly supports complex topologies without a single point of failure.
-
Performance and Networking: Because A2A builds on HTTP, it works with existing enterprise networking infrastructure. Agents can register DNS entries and sit behind API gateways or load balancers. To minimize latency in high-volume settings, it’s advisable to enable HTTP keep-alive or HTTP/2 for agent communications, so that multiple JSON-RPC calls (and Server-Sent Events streams) reuse connections. In terms of throughput, A2A calls are lightweight JSON messages, but long-running tasks may keep connections open (for SSE streaming) – ensure your infrastructure (proxies, ingress controllers) are tuned to handle potentially many open SSE connections if you use streaming heavily. Internally, the protocol allows concurrent tasks on the same agent – an agent service can process multiple tasks in parallel if its backend logic permits, so scaling vertically (with more threads or async processing) is also possible. Standard scale-out tactics (auto-scaling based on CPU or request rate, multi-zone deployments for resilience, etc.) apply in full. In summary, A2A’s architecture is cloud-native and distributed by design, making it straightforward to deploy a network of cooperating agent services that can grow with enterprise demands.
Integration Requirements for Enterprises
One of A2A’s strengths is that it is designed to meet enterprises where they are, integrating with existing systems rather than requiring a greenfield environment. Here’s how A2A fits in with common enterprise infrastructure:
-
Service Mesh and Network Architecture: If your enterprise uses a service mesh (e.g. Istio, Linkerd) or API gateway, A2A traffic can be treated like any internal API call. Because A2A uses plain HTTP requests, the mesh can transparently apply mutual TLS between services, do service discovery, and enforce network policies. For instance, you might use the service mesh’s DNS naming to have agents call
http://finance-agent.mesh.local
which resolves to the finance agent service. The Agent Card URL (which is typicallyhttps://<host>/.well-known/agent.json
) can also be internal – it doesn’t have to be internet-exposed if all interactions are within your VPN or mesh. Using a mesh, you can also get telemetry for free (requests, latencies) and handle retries or timeouts uniformly. In short, A2A was built “on top of existing, popular standards including HTTP [and] SSE” specifically to make it “easier to integrate with existing IT stacks” (Announcing the Agent2Agent Protocol (A2A)). Your current cloud networking setup will likely require minimal changes to start using A2A between services. -
Identity Providers and SSO: As mentioned in the security section, A2A defers to external identity systems for issuing and validating credentials. In an enterprise setting, this means you will integrate A2A agent authentication with your Identity Provider (IdP) – e.g., OAuth2/OIDC providers like Okta, Auth0, Azure AD, or Google Cloud IAM. A typical setup might involve registering each agent service as a resource server with the IdP and assigning it a client ID or audience, then granting client agents the ability to obtain tokens (via client credentials grant or similar) to call it. When a client agent wants to call a remote agent, it would request a JWT from the IdP that asserts its identity and permissions, and then include that token in the A2A request’s header. The remote agent will verify the token signature and claims before accepting the task. This process is very similar to securing any REST API in the enterprise. Because A2A supports multiple schemes, you could also integrate with legacy systems (for example, an agent might accept a session cookie or a custom HMAC signature if needed), but JWT/OAuth2 is recommended for its interoperability. Importantly, this approach means A2A can leverage Single Sign-On (SSO) frameworks: if an agent ultimately performs an action on behalf of a user, that user’s existing SSO session/token can be propagated to the agent via the client agent, achieving end-to-end user traceability. While not built-in to A2A, such delegated user context can be implemented by passing user tokens through A2A messages or referencing them in the task parameters, allowing enterprises to enforce zero-trust principles (every request is authenticated and authorized in context).
-
Access Control and Policy Enforcement: Enterprises often have central access control policies (for data governance, compliance, etc.). A2A does not hard-code any specific policy engine, but it provides the hooks to enforce policies. For example, an organization might stipulate that only certain departments’ agents can invoke a sensitive HR agent. This can be enforced by the HR agent requiring a token with an appropriate role claim, or by an API gateway in front of the HR agent’s A2A endpoint checking the caller’s identity against a policy (e.g., using an OPA – Open Policy Agent – rule). Additionally, because the Agent Card lists an agent’s capabilities and required auth, an enterprise could develop a directory service or governance layer that scans all agent cards and flags any that have insecure configurations (e.g., an agent that allows an
apiKey
auth which might be less secure could be disallowed on the corporate network). Another integration point is with logging and SIEM systems: since A2A uses standard protocols, it’s straightforward to log all A2A requests (the HTTP requests can be logged, and since they carry task IDs and agent IDs, you get an audit trail of which agent asked which agent to do what). These logs can feed into security monitoring systems to detect anomalies (like an unusual agent calling pattern that might indicate misuse). In summary, enterprises can treat A2A interactions as they would any API calls in terms of governance – using API management platforms, gateways, and monitoring to apply their access controls and audit requirements. -
Working with Existing Application Ecosystems: A2A is vendor-neutral and meant to complement existing automation frameworks. It’s not a replacement for tools like service buses or RPC frameworks, but rather a specialization for agentic AI interactions. You can deploy A2A alongside existing service meshes, message queues, or RPC systems. For example, if you already have a Kafka event bus for microservices, A2A can be an overlay for agent communication on top of that – an agent might produce events to Kafka as part of task completion, or vice versa, but A2A itself remains point-to-point between agents. Similarly, if you use gRPC internally, you could wrap an agent’s gRPC API with an A2A facade so that external agents (or agents written in other stacks) can talk to it via the open protocol. The guiding idea is that A2A provides a common language for agent collaboration (Announcing the Agent2Agent Protocol (A2A)), enabling “agents from different vendors to interact and share context” across enterprise platforms (SAP and Google Cloud Are Advancing Enterprise AI | SAP News Center). Thus, integration is often about mapping or bridging A2A to your existing systems where necessary, while gradually adopting it as a unifying layer for new agent development.
Core Components and Data Flow of the A2A Architecture
Let’s break down A2A’s architecture into its core components and describe how data flows through the system. At a high level, A2A involves two roles in any interaction: a Client Agent (which initiates a request on behalf of a user or higher-level goal) and a Remote Agent (which receives the request and acts on it) (Announcing the Agent2Agent Protocol (A2A)). These roles correspond to specific components and data structures defined by the A2A spec:
-
Agent Card: This is a public JSON metadata document (often served at the URL
https://<agent-host>/.well-known/agent.json
) that each agent service provides for discovery (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). Think of it as the agent’s “capabilities profile” or a business card. It typically includes the agent’s name or ID, the base URL for its A2A endpoint, supported authentication schemes, a list of its capabilities/skills, and other metadata like supported content formats. Clients retrieve the Agent Card of a remote agent as the first step before calling it. For example, an Agent Card might look like:This tells a client agent how to talk to the Finance agent – it must use a Bearer token, it can stream results, and it provides a skill to approve expenses. In enterprise use, the Agent Card can be seen as an API contract for the agent service (analogous to an OpenAPI spec, but much simpler and focused on high-level capabilities and auth). Clients can cache agent cards or keep a registry of known agents for faster discovery.
-
A2A Server (Agent Service): An A2A Server is simply an agent implementation that listens for incoming tasks over HTTP and speaks the A2A protocol (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). Internally, this could be powered by anything – a large language model (LLM), rule-based logic, a database of knowledge, or calls to other APIs – but externally it exposes a uniform interface. The A2A Server’s job is to receive a task from a client, process it (which may involve multiple steps or calls in the backend), and then return results and status updates according to the protocol. It maintains state for ongoing tasks (at least in memory or storage) so it knows, for example, if a task is waiting for input or has produced partial output. In terms of deployment, as discussed, each A2A Server can run as a microservice.
-
A2A Client: Any entity that calls an A2A Server using the protocol is acting as an A2A Client (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). This could be another agent service or even a non-agent application (for instance, a traditional app that wants to delegate a subtask to an AI agent via A2A). The client role involves discovering the remote agent’s card, composing the task request (including initial message and a new task ID), and handling responses or events from the remote agent. Often, an agent will be both a server and a client (server to those who call it, client to the agents it calls). The A2A specification and provided libraries handle much of the client logic (e.g., performing the HTTP POST, opening an SSE stream if needed, etc.), so developers can integrate this without building from scratch.
-
Task: In A2A, a Task is the fundamental unit of work that agents collaborate on (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). A task is usually initiated by a client agent to request some operation from a remote agent. Each task has a unique Task ID (provided by the client when creating it) and goes through a defined lifecycle of states. The states include:
submitted
(task received),working
(agent is processing it),input-required
(agent is pausing, awaiting additional input from the client, e.g. needs clarification), and a terminal state which can becompleted
(success),failed
(error), orcanceled
(GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.) (A2A/llms.txt at main · google/A2A · GitHub). The Task object encompasses metadata about the task’s progress, its final outcome, and any outputs (artifacts). In practice, when a client sends a task request, the response (or subsequent events) will contain a Task representation that includes fields like the taskid
, currentstatus
(with state and maybe a message or error info), and any accumulated conversation history or results. -
Message: Agents communicate within a task by exchanging Messages, which represent individual turns in a conversation or commands/responses in an interaction (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). Each message has a
role
(either"user"
for messages coming from the client side, or"agent"
for messages from the agent side) and a content payload. Rather than treating the content as a raw string, A2A structures messages into Parts. -
Part: A Part is the smallest unit of content in A2A – it can be thought of as a typed chunk of data inside a message (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). A single message may contain one or multiple parts. The protocol defines a few part types:
- TextPart: plain text content.
- FilePart: binary file content (with either an inlined byte payload or a URI pointing to the file).
- DataPart: structured data like JSON objects (used for forms, or any machine-readable content).
Each part includes a MIME-like type descriptor. For example, a DataPart might be labeled as
application/json
with an actual JSON object as its value. This design allows multimodal and structured data exchange: an agent’s message could include a text explanation, plus an image (as a FilePart), plus a form or table of results (as a DataPart), all in one coherent response. By breaking messages into typed parts, A2A enables rich interactions beyond just text (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). It also facilitates user experience negotiation – the client and agent can agree on content formats. For instance, if a client only supports text and images but not video, it can indicate that in its Agent Card or initial message, and the remote agent will stick to those modalities. (The A2A spec highlights this as “Each message includes ‘parts’… allowing client and remote agents to negotiate the correct format… e.g., iframes, video, web forms, and more” (Announcing the Agent2Agent Protocol (A2A)).)
-
Artifact: An Artifact represents an output or by-product of a task (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). If messages are the turns in a dialog, artifacts are more like final deliverables or files produced. For example, if an agent was asked to generate a report, the final PDF might be an artifact. Artifacts also consist of Parts (just like messages) – so an artifact could contain, say, a FilePart (the file content or link) or DataParts (structured results). The distinction between a message and an artifact in A2A is subtle: messages are exchanged between agents during the task, whereas artifacts are typically results of the task that might be presented to the end-user or stored. In practice, the A2A Server will often send artifacts either at task completion or via streaming updates as they are ready. The client agent can retrieve or receive artifacts and then decide how to use them (e.g., show to a user or forward to another system).
Typical Data Flow: Given those components, here’s how a full cycle of A2A interaction usually flows:
-
Discovery: The client agent locates the remote agent’s Agent Card (by making an HTTP GET request to the well-known
/.well-known/agent.json
endpoint on the agent’s host or otherwise obtaining the JSON) (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). It parses the Agent Card to learn what the agent can do, what URL to call, and what auth to use. For example, the client might see that the agent has a skill it needs (say “expense approval”) and requires a bearer token – so the client ensures it has a valid token ready. -
Initiation: The client formulates a task request to send. It generates a new Task ID (often a UUID or application-specific unique string) to identify this task. It then sends an HTTP POST request to the remote agent’s A2A endpoint (the exact URL might be the base URL from the card or a specific path like
/a2a
; the protocol supports a JSON-RPC interface which we detail later). The body of this request includes atasks/send
method call with the Task ID, an initial Message (usually representing the end-user’s query or command that kicked off the process), and any other parameters (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). At this point, the task is created on the remote agent’s side, and the agent will respond to acknowledge it has the task. -
Processing: The remote agent’s A2A Server begins working on the task. What happens next depends on whether the client requested a streaming interaction or a simple synchronous call:
- Non-Streaming (Synchronous): The client can use
tasks/send
in a call-and-wait fashion. In this case, the remote agent will process the entire task and then return a final Task object in the HTTP response. The Task will likely havestatus.state = "completed"
(or"failed"
if something went wrong) and include the agent’s answer in thestatus.message
orartifacts
(GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). Essentially, the client gets the result in one go, similar to a normal API call. - Streaming (Asynchronous): For longer or interactive tasks, A2A supports streaming updates. The client would use the
tasks/sendSubscribe
method to initiate, indicating it wants Server-Sent Events (SSE) for progress (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). In this mode, the initial HTTP response might just confirm the task was accepted, and then the remote agent will stream events over an SSE connection as the task progresses (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). These events include TaskStatusUpdateEvent messages (with updated status and possibly partial messages from the agent) and TaskArtifactUpdateEvent messages (when an artifact is produced) (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.) (A2A/llms.txt at main · google/A2A · GitHub). Streaming mode is crucial for long-running jobs – for example, an agent performing a research task for hours could send periodic status updates (“50% complete…”) or intermediate results. It’s also used for real-time AI interactions, like an agent generating a long textual answer and streaming it token-by-token. The SSE channel stays open until the task reaches a final state, at which point a final event (marked with"final": true
) is sent and the stream can close (A2A/llms.txt at main · google/A2A · GitHub) (A2A/llms.txt at main · google/A2A · GitHub).
- Non-Streaming (Synchronous): The client can use
-
Interaction (Optional): Many tasks involve multiple back-and-forth steps, especially if the remote agent needs more information. If the remote agent cannot complete the task without clarification or additional input, it can pause the task in an
input-required
state (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). This means the agent is waiting for the client (or ultimately the end-user) to provide something – e.g., “Please confirm you want to proceed with approval.” The client agent detects this state (via the status update or response) and can then send anothertasks/send
request with a follow-up message, referencing the same Task ID to continue the conversation (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). In essence, the task becomes a session of dialogue. The protocol ensures all these messages are tied to the task context and appended to the task history. The client and agent repeat this request/response cycle until the agent has what it needs to finish the task. (Not all tasks require this – many will complete immediately or stream results without explicit confirmation – but A2A allows interactive workflows when needed, which is powerful for complex use cases.) -
Completion: Eventually, the remote agent signals that the task is finished – either successfully (
state = "completed"
) or with a failure ("failed"
) or because the client canceled it ("canceled"
). At this point, the final results are available to the client. In non-streaming usage, the final results would have been in the HTTP response; in streaming usage, the final SSE event will carry the completion status and any final artifact or message (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). The client agent can then take whatever action is appropriate with the outcome – e.g., present it to a user, trigger another process, or perhaps initiate new A2A tasks as a result.It’s worth noting that A2A includes a cancellation mechanism: a client can invoke
tasks/cancel
on a task if it is no longer needed (for example, user aborted the operation) (A2A/llms.txt at main · google/A2A · GitHub). Upon cancellation, the remote agent should cease work and mark the task as canceled. Also, if connectivity is lost during streaming, the spec provides atasks/resubscribe
method to resume listening to updates on a task without losing information (A2A/llms.txt at main · google/A2A · GitHub). These ensure robust lifecycle management in real-world conditions (network hiccups, user changing their mind, etc.).
(Announcing the Agent2Agent Protocol (A2A)) Figure: Illustrative A2A communication flow between a Client Agent and a Remote Agent. The protocol enables secure collaboration through task and state management, capability discovery (via Agent Cards), and user experience negotiation (negotiating data formats and modalities).
Throughout this flow, A2A abstracts away the internal complexities of each agent. The client doesn’t need to know how the remote agent fulfills the task – only how to speak the common protocol to request and receive the result. All state transitions are well-defined, so both sides can synchronize on the task status. This structured lifecycle is crucial in enterprise settings: it allows higher-level orchestration and monitoring (for instance, an enterprise dashboard could track all tasks across agents and see which are pending or failed, since each task has a state machine). In the next section, we will delve deeper into the actual API and schema specifications that make the above flow possible.
Protocols, Schemas, and APIs Defined in the Spec
Under the hood, A2A leverages established web protocols and a JSON-based schema to standardize agent communication. Here we outline the key technical specifics: the transport protocols in use, the API endpoints/methods, and the important data schemas (objects) defined by A2A.
-
Use of HTTP + JSON-RPC 2.0: A2A is implemented on top of HTTP with a JSON payload protocol. Specifically, it adopts JSON-RPC 2.0 as the envelope for requests and responses (How the Agent2Agent Protocol (A2A) Actually Works: A Technical Breakdown | Blott Studio). JSON-RPC is a lightweight RPC protocol where a client sends a JSON object containing a
method
name, parameters, and an ID, and the server replies with a JSON object containing aresult
or anerror
and the same ID. In A2A’s case, methods are things like"tasks/send"
or"tasks/cancel"
. This design choice means A2A can be implemented as a single HTTP endpoint (e.g.,POST /a2a
) that accepts different method calls in the JSON body, rather than requiring multiple distinct REST endpoints for each action. It simplifies integration in that one route can handle many actions, and it aligns with how many AI APIs (like OpenAI’s) use a single endpoint with a payload determining the operation. For developers, it’s straightforward to use – for example, usingcurl
one might send:This would initiate a task on the Finance Approval agent. The JSON-RPC
id
is the task ID (so that the response or events correlate), and theparams
include the same ID plus the initial message. JSON-RPC also supports batched calls, though in A2A context that’s less common (you usually deal with one task at a time). The response to the above might be:This indicates the task completed with the agent’s reply in the status message. If the task was long-running and the client used
tasks/sendSubscribe
, the initial result might only confirm submission, and subsequent events would stream. -
Core API Methods: The A2A spec defines a set of standard methods (the RPC calls) that all compliant agents should implement (A2A/llms.txt at main · google/A2A · GitHub) (A2A/llms.txt at main · google/A2A · GitHub). The primary ones include:
tasks/send
: Submit a task (with no streaming). The result is the final Task object (or an immediate error).tasks/sendSubscribe
: Submit a task and subscribe for updates via SSE. The result is delivered as a stream of events rather than in the HTTP response.tasks/get
: Query the current state of a task by ID (A2A/llms.txt at main · google/A2A · GitHub). This is useful if a client didn’t use SSE and wants to poll for completion, or to recover state at any point.tasks/cancel
: Request cancellation of a running task (A2A/llms.txt at main · google/A2A · GitHub). The server will attempt to abort the task and return a Task object showing it was canceled (or an error if it was too late to cancel).tasks/pushNotification/set
: Register a webhook URL for push notifications on a task (A2A/llms.txt at main · google/A2A · GitHub). This is an alternative to SSE: the client provides an endpoint (and optionally an auth token or credentials) that the server should call when there are updates. This way, if the client can’t hold open an SSE connection (imagine a client that is a backend service without long polling, or an agent that wants out-of-band notifications), the server can POST updates to the given URL. The Agent Card’s capabilities field will indicate if the server supportspushNotifications
(GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). When using this, after callingtasks/pushNotification/set
, the server will invoke the provided callback with the same kind of events it would have sent over SSE. Typically the client supplies a one-time token or key in the config so it can verify the authenticity of callbacks (to ensure it’s the correct agent calling) (What is The Agent2Agent Protocol (A2A) and Why You Must Learn It Now).tasks/pushNotification/get
: Retrieve the current webhook configuration for a task (A2A/llms.txt at main · google/A2A · GitHub) (not commonly needed except for debugging).tasks/resubscribe
: Re-initiate an SSE subscription for a task, for example if the original SSE connection dropped (A2A/llms.txt at main · google/A2A · GitHub). Together, these APIs provide a full suite for task management – creation, status checking, cancellation, and asynchronous delivery. They are the “operations” of the A2A protocol, whereas the Agent Card is more like the “description” of capabilities.
-
JSON Schemas and Data Structures: The A2A repository includes a formal JSON Schema (in
a2a.json
) that defines all the objects and types used in the protocol. Important structures include:- AgentCard schema: Defines fields like
agentId
,url
,capabilities
(booleans for features likestreaming
,pushNotifications
, etc.),authentication
(with subfields to list auth schemes, as shown earlier),skills
(an array of objects each describing a skill with an ID, description, and example usage), and optional metadata likedescription
or contact info. This schema ensures agent cards are uniform and can be parsed by client libraries (What is The Agent2Agent Protocol (A2A) and Why You Must Learn It Now). - Task schema: Describes what a Task object contains. Key fields are
id
(task ID),status
(a TaskStatus object),history
(list of past Message objects in this task’s conversation),artifacts
(list of Artifact objects produced so far), and possiblymetadata
(for additional info or extension fields). The TaskStatus includes astate
(the lifecycle state) and might include a latestmessage
(e.g., agent’s last reply) or anerror
if state is failed. - Message schema: Contains
role
(user or agent),parts
(array of Part objects), and optionalmetadata
(could include timestamps or other tags). - Part schema: Each Part has a
type
(like “text”, “file”, “data”) and fields corresponding to that type. For TextPart: atext
string; for FilePart: maybe aname
and either an embeddedbytes
(base64 data) or auri
to fetch it, plus amediaType
(like image/png or application/pdf) to know what it is; for DataPart: adata
field (could be any JSON) and perhaps a schema reference orformat
. This flexible schema allows any kind of content to be packaged in a message. - Event schemas: The TaskStatusUpdateEvent and TaskArtifactUpdateEvent structures (used in SSE or push) include the
id
of the task and then either a new status or an artifact, plus a booleanfinal
flag to indicate if it’s the last event for that task (A2A/llms.txt at main · google/A2A · GitHub) (A2A/llms.txt at main · google/A2A · GitHub). - Error schema: Follows JSON-RPC error format, but A2A defines some custom error codes for agent-specific conditions (e.g.,
TaskNotFoundError
code-32001
if an unknown task ID is referenced, orTaskNotCancelableError
-32002
if you try to cancel a task that’s already done) (A2A/llms.txt at main · google/A2A · GitHub) (A2A/llms.txt at main · google/A2A · GitHub). This helps clients programmatically handle errors (like retrying a send if not found, or informing user if cancellation failed because it was too late).
All these schemas are documented in the A2A spec to ensure interoperability. An agent implemented in Java, for instance, can use the JSON schema to generate classes, while a Python implementation can use Pydantic or similar to validate messages. The versioning is also captured here: the Agent Card has a
version
field (as seen in the example, e.g., “1.0.0”) which likely corresponds to the protocol or agent version. The A2A spec designers included protocol versioning to allow evolving the standard without breaking existing agents (How the Agent2Agent Protocol (A2A) Actually Works: A Technical Breakdown | Blott Studio). Thus, an Agent Card can communicate what protocol version it supports, and clients can adjust accordingly or maintain backward compatibility. - AgentCard schema: Defines fields like
-
Capability Negotiation: The protocol includes several points of negotiation to ensure client and server agree on how to communicate:
- Content modalities: Through
defaultInputModes
anddefaultOutputModes
in the Agent Card, an agent states what content types it expects and produces (for example,["text", "text/plain"]
indicates it primarily handles plain text). Similarly, the Parts in messages have explicit types. This means that if a client agent can only handle text, it will ignore or down-grade any binary parts it might receive, or it might choose not to call an agent that only returns images if it cannot handle images. This is part of what the spec calls “User experience negotiation”, making sure the format of interaction fits the capabilities of both sides (Announcing the Agent2Agent Protocol (A2A)). - Skill selection: If an agent card lists multiple skills, a client can choose which skill to invoke by context. There isn’t a special field in the API to pick a skill; rather the client simply formulates the request (task message) appropriate to that skill. However, future extensions might include a more direct skill query or invocation mechanism. The idea is that the Agent Card’s skill list helps the client decide which agent to send a task to. For example, an orchestrator might look at all known agents’ cards to route a user request to the one with the matching skill (capability discovery).
- MCP Integration: Although not the focus here, A2A is designed to complement Anthropic’s Model Context Protocol (MCP). The two serve different purposes – MCP is for providing context to AI models, while A2A is for agent-to-agent communication – but they can work together (Google’s Agent2Agent interoperability protocol aims to standardize agentic communication | VentureBeat) (Google’s Agent2Agent interoperability protocol aims to standardize agentic communication | VentureBeat). The A2A spec documentation mentions an “A2A and MCP” topic which likely explains how an agent might use MCP internally and still expose A2A externally (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.). For example, an agent could be an MCP server (taking user prompts with additional context) and also an A2A server so it can accept tasks from other agents. The interoperability with MCP and other standards is part of the protocol’s openness.
- Content modalities: Through
In summary, the A2A specification provides a complete JSON schema and a set of RPC-style APIs that any implementation can follow. By using familiar technologies (HTTP, JSON) and patterns (well-known URL for discovery, JSON-RPC calls for actions, SSE for events), it lowers the barrier for integration. The learning curve for an engineer used to REST or RPC APIs is minimal – A2A just formalizes the patterns needed for multi-agent systems. Next, we’ll consider how to observe and manage such a system over time, which is crucial for production deployments.
Observability and Lifecycle Management
Operating a network of agents in production requires visibility into what they’re doing and control over their lifecycles. A2A’s structured approach to tasks lends itself well to observability, and it provides hooks for managing task lifecycles. Here’s how enterprises can monitor and manage A2A-based agents:
-
Task Lifecycle Monitoring: Because every A2A task has a defined lifecycle state, it’s straightforward to instrument monitoring around those states. Agents can expose metrics like “number of tasks in progress” (count of tasks in
working
state), “tasks completed vs. failed”, and task durations (time fromsubmitted
tocompleted
). These can feed into dashboards to track system health. For example, if a particular agent shows many tasks stuck inworking
orinput-required
for a long time, that might indicate a problem (maybe it’s waiting on input that never comes, or an external dependency is slow). The A2A protocol itself doesn’t dictate monitoring tools, but since the interactions are via HTTP, you can use distributed tracing and logging as you would with microservice calls. Attaching a trace ID to each task (the Task ID or a separate correlation ID) and propagating it when one agent calls another will allow end-to-end tracing of a multi-agent workflow in tools like Jaeger or Zipkin. -
Logging and Auditing: Each agent service can log A2A requests and responses (with appropriate sanitization for sensitive data). Because tasks have unique IDs, logs across services can be correlated by task ID. Consider a scenario: Agent A calls Agent B with Task ID 123; Agent B then calls Agent C with Task ID 456 as a sub-task; Agent A’s logs will have references to 123, Agent B’s logs will show it handling 123 and then initiating 456, and so on. By stitching these, you can audit a complex multi-agent transaction. The history field in the Task object (which contains the conversation messages) is also useful for auditing – an enterprise could store final task history for a record of what was asked and answered. This is especially important in regulated industries: if agents are making decisions (say approving a loan), you need an audit trail. A2A makes it possible to retain that context. Some enterprise users might build a central log or database of all tasks and their outcomes for offline analysis or compliance.
-
Server-Sent Events and Push for Observability: When using SSE for streaming, those events not only provide real-time updates to the client agent, but they can also be tapped for monitoring. For instance, an SSE event stream could be mirrored to a monitoring service that displays active tasks and their latest status. The A2A sample web UI (the Multi-Agent Web App in the repo) demonstrates this by showing a live event list and task list during agent conversations (Google A2A - a First Look at Another Agent-agent Protocol | HackerNoon) (Google A2A - a First Look at Another Agent-agent Protocol | HackerNoon). Enterprises could similarly create a “control center” that subscribes (perhaps as a privileged client) to certain agent events. Alternatively, using push notifications, one could have a monitoring agent that registers as a listener for all tasks (if the implementation allows multiple subscribers or a broadcast) – though out of the box, push config is per task for a single client. In any case, the streaming nature of A2A means you don’t have to poll continuously to know what’s happening; you can react to events. This event-driven model is great for building responsive dashboards and alerting systems (e.g., trigger an alert if a task fails).
-
Lifecycle Control (Tasks): We’ve covered how cancellation is supported (
tasks/cancel
), allowing systems to terminate tasks that are no longer needed. This is crucial for resource management – you wouldn’t want an agent continuing a costly operation if the user already got an answer or if the workflow pivoted. Another aspect is timeouts: The protocol itself doesn’t enforce global timeouts, but a client agent can decide to cancel a task if it’s taking too long. On the server side, an agent implementer might impose an internal timeout (mark task as failed if it exceeds X minutes, for example) to avoid runaway processes. If such a timeout occurs, the agent can update the task state tofailed
and perhaps include an error message like “Timed out”. Because the client might be listening via SSE or will eventually calltasks/get
, it will learn of this failure. Consistent handling of states makes it easier to manage these scenarios systematically. -
Lifecycle Management (Agent Services): Beyond individual tasks, enterprises will manage the lifecycle of the agent services themselves – deploying new versions, adding or removing agents, etc. A2A aids this by decoupling agents: as long as an agent’s Agent Card is reachable and it speaks the protocol, others can use it. To update an agent, you could deploy a new version at the same URL (perhaps updating the
version
field in the Agent Card). If you need to decomission an agent or change its endpoint, you’d update any orchestration logic or directories that pointed to it. In a dynamic environment, a discovery service could be employed so that agents can find each other without hard-coding URLs – for example, an enterprise might maintain a registry mapping agent names to their current base URLs and Agent Cards (think of it like a mini service registry specifically for A2A agents). This is not part of the A2A spec, but a practical add-on for larger deployments. Over time, if the A2A ecosystem grows, one could imagine a more automated discovery mechanism or even agent directories as part of the standard. -
Observability of Performance and Bottlenecks: With multiple agents calling each other, it’s important to know where time is spent. If a user’s request passes through five agents, the latency could add up. Using the task state timestamps (each status update can carry a timestamp) and trace logs, you can measure how long each agent took to do its work. For example, Agent B might be consistently slow, which signals a need to scale it up or optimize its logic. Similarly, if an agent often goes into
failed
state, monitoring can catch that trend and trigger investigation (maybe its integration with some external system is broken). Standard APM (Application Performance Management) tools can be integrated since A2A doesn’t hide the fact that these are HTTP calls – you can instrument them as you would any API. -
Error Handling and Recovery: A robust enterprise implementation will include strategies for error handling. A2A provides error codes for common issues (e.g., method not found, invalid params, etc. in JSON-RPC, plus task-specific errors) (A2A/llms.txt at main · google/A2A · GitHub) (A2A/llms.txt at main · google/A2A · GitHub). Client agents should implement logic to handle errors gracefully – for instance, if a call to a remote agent fails due to authentication (maybe a 401 Unauthorized), the client could attempt to refresh its token and retry. If a task fails because the remote agent couldn’t fulfill it (maybe lack of information), the client might decide to try an alternative agent or ask a human. These decision trees are beyond the spec, but the point is that A2A gives you the information needed to implement them (clear status, error messages). In terms of recovery, if an agent service goes down mid-task, the client can detect a broken connection and use
tasks/resubscribe
when the agent comes back, or even re-submit the task if needed. Since tasks are idempotently identified by IDs, a re-submission with the same ID could either resume (if the server saved state) or be recognized as a duplicate – but more commonly, a new attempt would use a new ID. -
Enterprise Lifecycle (Versioning and Compatibility): As A2A evolves, new features might be added. The protocol’s versioning scheme and the presence of a
version
in Agent Cards allow gradual upgrades. For example, suppose A2A 1.1 defines a new part type for audio streaming; an agent supporting 1.1 will list that, and older agents (1.0) will ignore that part if received. The A2A team has emphasized compatibility so that “older agent versions work alongside newer ones”, protecting investments in agent development (How the Agent2Agent Protocol (A2A) Actually Works: A Technical Breakdown | Blott Studio). Enterprise architects should still plan for consistent upgrades – likely by updating agent services and their cards in a controlled fashion – but they can do so incrementally. An A2A client library might negotiate or adjust to the lowest common version between itself and a remote agent.
In conclusion, A2A not only standardizes how agents talk to each other, but also makes it easier to observe and manage those conversations at scale. By structuring interactions as tasks with states and using ubiquitous protocols, it integrates smoothly with enterprise DevOps practices. Teams can apply their existing tools for monitoring, logging, and security to this new world of AI agent communication. The specification’s completeness (covering discovery, messaging, streaming, error codes, etc.) means engineers have a clear blueprint for building reliable, interoperable agent services.
Get Connected, Share, and Other Socials
Share on LinkedIn
Share on X/Twitter
Have thoughts? I'd love to chat!
More about Rahul and Elevate.do
Follow Rahul on LinkedIn
Follow Rahul on Twitter/X
Sources: The information in this post is based on Google’s public A2A specification and documentation (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.) (Announcing the Agent2Agent Protocol (A2A)) (What is The Agent2Agent Protocol (A2A) and Why You Must Learn It Now) (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.), as well as insights from early analyses of the protocol (How the Agent2Agent Protocol (A2A) Actually Works: A Technical Breakdown | Blott Studio) (A2A/llms.txt at main · google/A2A · GitHub). All code examples and flows adhere to the behaviors described in the open-source A2A repository (GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.) (A2A/llms.txt at main · google/A2A · GitHub). As the A2A project is open and rapidly evolving, readers are encouraged to refer to the latest official spec on the A2A GitHub for the most up-to-date details and best practices.