I’ve spent the past week in my downtime exploring Siri Shortcuts in my Stereo Remote app. My Stereo Remote app is an iOS app for controlling my AV receiver. I use this because the hardware remote is god awful, and the software remotes are god awful. It’s a good app, I use it multiples of multiples of times daily. It’s the best solution to the problem, but even with that I have often dreamed of being able to control the AV receiver with my voice. When you think about the series of steps as a user, in a worst case scenario that you have to take to change an input or setting, such as:
- Authenticating to open a device
- Heading back to Home screen
- Find/selecting the app
- Waiting for the app to be ready
- Wait for app to connect to receiver
- Altering state of required input.
All of this can be handled invisibly to the user with a single voice command. I have recently talked about how voice takes increased cognitive load to foster commands, but the resulting voice command here shaves off a good few seconds when compared to our worse case scenario. A good network connection shows the latency between the AV receiver initialising to it’s ready state vs Siri’s done state. So it is pretty fast. I should mention as well, this level of control is just one of those sci-fi house of the future things I feel I should be able to tick off my list in 2018. The best bit is I got this without spending loads of cash on a new AV receiver, I just had to make my brain work a bit harder for a couple of nights.
About Siri shortcuts
First announced and demoed at WWDC 2018, Siri Shortcuts allow the system to present app-suggested shortcut commands to the user. The user then can create their own shortcut phrase to trigger Shortcuts via Siri. These commands can then be understood and implemented by most Siri capable devices (macOS being an outlier), thanks to the wonders of iCloud syncing.
Overall, Shortcuts are a great new piece of functionality for users. The nerd corners of the Apple ecosystem have been calling out for this for years. Well, maybe not this, but they have been crying out for something in terms of being better able to integrate Siri with apps. An earlier version of SiriKit which is developer facing was released the other year. But it’s utility is limited by the amount of domains – a domain being an area of knowledge that Siri can handle requests for – provided through the API. Largely limiting developers to providing alternative to existing system parallels: messaging, navigation, workouts, media playback. A couple of domains are available for some things that don’t really equate to existing system stuff like ride hailing and payments. These are the ones I can remember it exposes, there are probably a couple more.
So this is where you can see the limitations. Sure, I can order a taxi, but I can’t order a pizza? In my instance, say, you want to send a command to another device, is that within the messaging domain? Could i get away with that? It’s sort of sending a message? I dunno. Luckily I don’t need to know. The improvements in iOS12 allow developers to expose commonly repeated actions from the user to be donated as shortcuts. This is something I can work with, although it’s not without it’s limitations in it’s current format. I’ll discuss them a little, later.
It’s not something that pertains to my work, but suggested shortcuts can also be utilised by Apple’s Proactive ML thing, showing up on the notification screen if the system decides it’s probable that you would want to use that particular function at that time. They can also show up in search results. Most interestingly is how they can be combined with other shortcuts in the Shortcuts app. The Shortcuts app itself is a rebadged and powered up version of Workflow, the app Apple bought to power automation on iOS. An example use case in my instance would be that it provides the user with the functionality to mix the stereo settings in with other home automation functions, setting a movie mode or something. Maybe it orders my favourite gelato from Just Eat too, if Just Eat donates a reorder function.
Though crazy powerful, and it is, you can perform a lot of mutations on an input, as well as using if else logic. In my personal use I’ve found little to use the Shortcuts app for because a lot of low hanging fruit isn’t available in there. And that comes from the app integrations. Apps from Apple like Activity don’t integrate with Shortcuts yet. I’d love some shortcuts for that. I expect this to change in time as people how to work with Shortcuts and more developers donate Intents in their apps.
Working with Shortcuts
From a developer perspective, the concepts in working with Siri mirror concepts used in popular natural language processing and machine learning tools like Dialogflow, which I’m familiar with. This means the we are working with Intents, where the user has expressed the intention to do something which is understood by and acted on by the system. Intents will have Entities or parameters within the users query or command that define what action should be taken on the users behalf.
This system allows flexibility that not every variation on a command has to be a separate intent. “Play [song name]” would be a very simple example of how an Intent with a parameter keeps the number of Intents we need to create down, while flexibility of the command remains due to the Enitity.
(I’m probably going to flip between Entity, which is what I’m used to from Dialogflow, and parameter which is what I’m used to from Xcode interchangeably in this post, so just be ready for that)
Intents comprehended by Siri allow for all functionality to be accessed from within the app, and can allow for confirmation or follow ups.
So the Intents system provided by Siri now allows flexibility. The limitation comes in at this point in the interface with how the user interacts with Siri via a shortcut. Siri Shortcuts in themselves do not take an Entity. With a verbal Shortcut, we are triggering a specific, prerecorded action. How this pertains to me as an example would be that I can trigger changes to finite states, changing power status, input source is simple as they are direct commands. The current implementation of Siri Shortcuts means that I can’t “set volume to 60%”.
A workaround to this is to provide within settings a section which allows for 4 sound settings that would be defined as frequently utilised by the user. These could then be trigged by a single short cut to “set stereo to loud movie”. Whilst annoying that I can’t have that granular control, the reality is I know I have 3 or 4 different volume zones that I may set the stereo to throughout the day. In some ways it expedites what the user wants to do in a simple way, which is good UX, but the failure to be able to comprehend a command a user could reasonably expect to work – after all they can change source or power – is bad UX.
What this all means in the end is that this is a work in progress, it’s a huge step forwards for Siri, and how we as users can utilise Siri to get shit done a little quicker is fantastic. I’m so impressed with the performance of something I was expecting to be a struggle to implement. Honestly, the bigger struggle was moving half of my code into a shared framework, necessary due to iOS’ sandboxing for our security, to allow the Intent extensions to access the code. The bonus of the shared framework is that it will allow me to quickly create a widget extension in the Today View of Notification Centre. A task I’d put off due to… not having a shared framework.
The next steps for user facing Siri interaction are clear to me that I as a user need to be able to set a parameter in my shortcuts. How Apple implements this within the current system will be interesting to see develop.
*** Insert Video here ***
Despite the extra work involved in creating a shared framework, setting up support for Siri Shortcuts has been incredibly easy. The bonus here is that once the framework is set up, you can plug into everything so quickly and easily. At this point I’ll live with the restrictions from Shortcuts regarding granular volume control. Maybe I should explore the messaging domain? But for now I’m happy with the increased functionality my AV receiver has gained. Even my housemate is impressed and she hates the AV receiver for how complicated it makes things.