Not really. For the most part, accessibility APIs provide programmatic interfaces to user interfaces, application APIs provide semantically meaningful interfaces to application functionality.
A closer analogue would be AppleScript, or rather, the underlying Apple Event and Open Scripting Architecture functionality supplied by the OS to support AppleScript, that allowed applications to expose these interfaces along with metadata documenting them, and for external tools to record manually performed tasks across applications as programs expressed in terms of these interfaces to make them easier to use (this last bit, while not strictly required, is convenient, and especially useful for less technical users).
If you're familiar with VBA in Microsoft Office applications, sort of like that, except with support provided by OS APIs that could be used by any application that chose to implement scripting support, official guidance from Apple suggesting that all well-designed applications should be scriptable and recordable, and application design patterns and frameworks designed with scriptability and recordability in mind.
Note that I use the past tense here, despite AppleScript still being available in macOS, because it is not well-supported by modern applications.
Do you want to do something that can't be done through AppleScript, macOS accessibility APIs, and something like Puppeteer to control the browser?
Or something you don't understand how to do manually?
Because I guess I don't understand the attraction of using an LLM for system automation where existing interfaces exist, other than as a form of documentation, or to write code using these interfaces.
The nicest thing about this rush to find and build "agentic" endpoints for controlling everything is that there's no reason these same endpoints can't be consumed by deterministic, non-LLM software as well.
It feels like 1994 called, and it's giving me my AppleScript back.
From applications that capture the screen or use accessibility APIs, perhaps, but what about, e.g., Windows applications that capture window messages, e.g.,
Obviously, if you can inject code into a process that receives sensitive data, you're already running in a context where all security bets are off.
But with processes you yourself create, you probably can, even without elevated privileges, unless the application takes measures to prevent injection (akin to game anticheat mechanisms), so it seems worth pointing out that there are simple mechanisms to subvert such "protected" fields that don't require application-specific reverse engineering.
— Stephen Colbert, 2006
https://www.c-span.org/clip/white-house-event/user-clip-step...
reply