Stochastic service composition for ʟᴛʟf reachability and safety tasks

Giuseppe De Giacomo; Marco Favorito; Luciana Silo; Giuseppe De Giacomo; Marco Favorito; Luciana Silo

doi:10.48130/ker-0026-0003

Service composition à la Roman model consists of realizing a virtual service by suitably orchestrating a set of already available services. In this paper, we consider a variant where available services are stochastic systems, and the target specification is goal-oriented and specified in Linear Temporal Logic on finite traces (ʟᴛʟ_f). In this setting, we are interested in synthesizing a controller (policy) that maximizes the probability of satisfying the temporal logic objective (either reachability or safety) while minimizing the expected cost of using the available services. To do so, we combine techniques from ʟᴛʟ_f synthesis, service composition à la Roman Model, and bi-objective lexicographic optimization on Markov Decision Processes (MDPs). This framework has several interesting applications, including Smart Manufacturing and Digital Twins.

HTML

1. Introduction

The service-oriented computing (SOC) paradigm uses services to support the development of rapid, low-cost, interoperable, evolvable, and massively distributed applications. Services are considered autonomous, platform-independent entities that can be described, published, discovered, and loosely coupled in novel ways^[1]. Service composition, i.e., the ability to generate new, more useful services from existing ones, is an active ﬁeld of research in the SOC area and has been actively investigated for over two decades.

Particularly interesting in this context is the so-called Roman Model^[2−4] where services are conversational, i.e., have an internal state and are procedurally described as finite transition systems (TS), where at each state the service offers a certain set of actions, and each action changes the state of the service in some way. The designer is interested in generating a new service, called target, which is described as the other service; however, it is virtual in the sense that no code is associated with its actions. So, for executing the target, one has to delegate each of its actions to some of the available services by suitably orchestrating the services, considering the current state of the target and the current states of the available services. Service composition amounts to synthesizing a controller that can suitably orchestrate the executions of the available services to guarantee that the target actions are always delegated to some service that can actually execute them in its current state.

Recently, a renewed interest in service composition à la Roman Model has emerged from its potential applications in smart manufacturing. In particular, through digital twins technology, manufacturing devices can export their behaviour as transition systems, and hence be orchestrated in very much the same way as services did back in the early 2000s^[5−7].

These technological trends align with the vision of smart manufacturing, which advocates the need for flexible, modular, and interoperable system architectures. Modern manufacturing systems often consist of heterogeneous components, ranging from robotic workstations to cyber-physical subsystems, that communicate through standardized APIs. In such settings, each component can be naturally abstracted as a service. Consequently, adopting the Service-Oriented Architectures (SOAs) approach enables the dynamic composition and coordination of such services, leveraging their exposed behaviors while maintaining loose coupling and scalability.

In this context, service composition à la Roman Model provides a useful abstraction for orchestrating device behavior toward achieving temporally extended goals. By treating each device or software module as a service with internal dynamics, and expressing high-level objectives via Linear Temporal Logic on finite traces (ʟᴛʟ_f), our framework bridges declarative specification with executable orchestration.

Interestingly, these new applications are also pointing out several variations that are not typically considered in earlier literature on services. First, they advocate for considering a stochastic behaviour of services, such as those studied in Yadav et al.^[8] and Brafman et al.^[9]. Unlike the classical model, in which the target specification can either be satisfied or not with no middle ground, in the stochastic setting, it is possible to define a notion of 'approximate solution' in case an exact one does not exist. Second, the notion of goal-oriented target specification is increasingly championed ^[5,6,10]. That is, instead of having the target specified as a transition system, it is specified as a (possibly temporally extended) goal that the composition has to fulfill. Of particular interest are specifications in ʟᴛʟ_f^[11], which are at the base of declarative process specification in Business Process Management (BPM) through the so-called declare paradigm^[12−14]. Third, apart from satisfying the target, it is of interest to also minimize the cost coming from the service utilization^[15−17]. This concern, together with the satisfaction of the target, calls for resorting to Multi-Objective Optimization for computing a solution.

A first attempt to do so may resort to Multi-Objective MDPs (MOMDPs)^[18]. One common solution is to reduce the multi-objective reward/cost optimization to a single reward optimization via a linear weighting of different sources of rewards/costs. However, this means that the two objectives, namely the maximization of target rewards and the minimization of cost uses, are blurred into one scalar value, which hides precious information from the agent. Instead, the maximization of the target objective has the highest priority. Among those strategies that maximize the first objective, we aim to find those strategies that achieve the minimum utilization cost. In the literature of multi-objective optimization, the setting in which there is a strict preference order among objectives is called lexicographic multi-objective optimization^[19−22]. It is known that, in general, single Markovian rewards cannot capture certain multi-objective tasks, such as ones with lexicographic preferences^[23]; hence, such problems cannot be easily reduced to standard techniques on MDPs^[24].

Among related works, one of the earliest attempts at combining stochastic planning models with service composition is a study by Gao et al.^[25]. There are works based on Markov-HTN Planning^[26], multi-objective optimization^[27,28], and lexicographic optimization^[29], helpful to model the stochastic behavior as well as complex QoS preferences. However, in all cases, either there are no stateful services, no high-level declarative specification of the desired solution, or no strict preference among objectives.

1.1. Contributions
In this paper, we address the three above requirements and study goal-oriented stochastic service composition, where, as goals, we adopt arbitrary ʟᴛʟ_f specifications, under process-traces semantics, where only a single action is executed at a time, as commonly done in reasoning about business processes in the declare framework^[12]. Specifically, we are given a goal specification, and we want to synthesize an orchestrator that, on the one hand, proactively chooses actions to form a sequence that satisfies the goal and, on the other hand, delegates each action to an available service in such a way that at the end of the sequence, all services are in their final states. The key contribution of this work extends the findings of De Giacomo et al.^[30], including both reachability and safety tasks, introducing a novel reduction technique for safety specification. The composition problem consists of maximizing the satisfaction probability of the ʟᴛʟ_f objective, and conditioned on this, minimizing the expected cost of utilization of the services.

We consider two variants of the ʟᴛʟ_f stochastic service composition problem. In the first variant of the problem, as the first objective, we aim to maximize the probability of satisfying at least once an ʟᴛʟ_f formula $ \varphi $. This is a reachability property: there exists a finite prefix $ t_{<k} $ of an infinite trace $ t $ such that $ t_{<k} \models \varphi $. This is the classical use ofʟᴛʟ_f to specify synthesis tasks^[31]. As the second objective, we consider the minimization of the expected cost of utilization of services, conditioned on the optimal satisfaction of the first objective. In the second variant, as the first objective, we aim to maximize the probability of not violating an ʟᴛʟ_f formula $ \varphi $. This is a safety property: every finite prefix $ t_{<k} $ of an infinite trace $ t $ is such that $ t_{<k} \models \varphi $. This is the classical use of ʟᴛʟ_f to specify environment behaviors^[32]. As the second objective, we consider minimizing the expected mean cost of utilization of services, conditioned on the optimal satisfaction of the first objective.

Observe that our reachability and safety ʟᴛʟ_f properties are conceptually similar to those formalized in studies by Aminof et al.^[33,34], where they use $ \exists \varphi $ to denote reachability and $ \forall \varphi $ to denote safety, where $ \varphi $ is an arbitrary ʟᴛʟ_f formula. However, the difference in our reachability and safety tasks within our composition framework lies in the fact that, in the satisfaction condition of these tasks, we also impose constraints on whether the services are in an accepting state; thus, we consider both the satisfaction of the ʟᴛʟ_f task and the configurations of the service community.

The solution technique for both variants of the composition problem relies on solving a bi-objective lexicographic optimization^[35] over a special MDP, allowing for minimizing the services' utilization costs while guaranteeing maximum probability of goal satisfaction. We point out that although this paper has mainly a foundational nature, it also has a significant applicative interest since it gives the foundations and solution techniques of goal-oriented compositions, which are indeed envisioned in the current literature on smart manufacturing where the notion of goal-oriented target specification is increasingly championed^[5−7,10].

This paper presents an approach for stochastic service composition to find orchestrators that maximize the probability of satisfying the reachability and safety ʟᴛʟ_f tasks and, among such orchestrators, find those who also minimize a cost utilization function. We extended our previous work^[30] by (i) generalizing the framework so as to encompass both variants of stochastic service composition, one for reachability and the other for safety ʟᴛʟ_f tasks; (ii) including a new section regarding the composition under ʟᴛʟ_f safety tasks; (iii) discussed in more detail a case study as a conceptual validation of our framework.

1.2. Outline
The rest of the paper is structured as follows. Section 2 explains the theoretical concepts on which the paper is based. Section 3 introduces the service composition framework in stochastic settings. Section 4 studies the stochastic composition problem in the case of the reachability task and provides a solution by reducing the problem to a bi-objective lexicographic optimization on MDPs^[35]. Section 5 studies the stochastic composition problem in the case of a safety task and provides reduction to bi-objective lexicographic optimization on MDPs^[35]; the reduction is analogous to the reachability case, but with several technical differences that require a deeper analysis. Section 6 shows the conceptual validation of the formal framework to an industrial case study of an electric motor assembly process, describing in detail the manufacturing goal and the manufacturing actors. Finally, Section 7 concludes the paper with final remarks and future work.

2. Preliminaries

2.1. Linear Temporal Logic on finite traces (ʟᴛʟ_f)

ʟᴛʟ_f is a variant of Linear Temporal Logic (ʟᴛʟ) interpreted over finite traces, instead of infinite ones^[11]. Given a set $ {\cal{P}} $ of atomic propositions, ʟᴛʟ_f formulas $ \varphi $ are defined by

$ \varphi :: = {tt} \mid p \mid \lnot \varphi \mid \varphi \wedge \varphi \mid \bigcirc\varphi \mid \varphi \;{\cal{U}}\; \varphi $

where $ {tt} $ represents the true formula, $ p $ denotes an atomic proposition in $ {\cal{P}} $, $ \bigcirc$ is the next operator, and $ \;{\cal{U}}\; $ is the until operator. We use abbreviations for other Boolean connectives, as well as the following: eventually as $ \Diamond \varphi \equiv true \;{\cal{U}}\; \varphi $; always as $ \square \varphi \equiv \lnot \Diamond \lnot \varphi $; weak next as $ {\LARGE \bullet } \varphi \equiv \lnot {\bigcirc} \lnot\varphi $ (note that, on finite traces, $ \lnot {\bigcirc} \varphi $ is not equivalent to $ {\bigcirc} \lnot \varphi $); and weak until as $ \varphi_1 \;{\cal{W}}\; \varphi_2 \equiv (\varphi_1 \;{\cal{U}}\; \varphi_2 \lor \Box \varphi_1) $ ($ \varphi_1 $ holds until $ \varphi_2 $ or forever). In the original proposal, ʟᴛʟ_f formulas were interpreted on finite traces. For simplicity, here we consider the semantics on simple (possibly empty) finite traces, also known as process traces^[36] (while the choice between finite-trace and process-trace semantics does not affect the generality of our framework, we prefer process-trace semantics as it aligns more naturally with our context. In this case, the traces result from the services' actions, which are more effectively represented as atomic propositions rather than in a factorized form). A process trace over propositions $ {\cal{P}} $ is a sequence $ a = a_0 \dots a_{l-1} $ where $ a_i\in {\cal{P}} $, and $ l $ is the length of the trace. The empty trace is denoted with $ \epsilon $.

Given a finite (possibly empty) trace $ a $, we define when an ʟᴛʟ_f formula $ \varphi $ is true in $ a $ at the instant $ i\in\{0, \dots, l-1\} $, denoted by $ a,i\models\varphi $, by inductively considering subformulas as follows^[37]:

● $ a,i\models {tt} $;

● $ a,i \models p $ iff $ a_i = p $;

● $ a,i\models\lnot\varphi $ iff $ a, i \not\models \varphi $;

● $ a, i \models \varphi_1 \wedge\varphi_2 $ iff $ a, i \models \varphi_1 $ and $ a, i \models \varphi_2 $;

● $ a, i \models {\bigcirc} \varphi $ iff $ i<l-1 $ and $ a, i + 1 \models \varphi $;

● $ a, i \models \varphi_1 \;{\cal{U}}\; \varphi_2 $ iff there exists $ j $ with $ i \ge j \lt l $ such that $ a, j \models \varphi_2 $, and for all $ k $ with $ i \le k \lt j $ we have that $ a, k \models \varphi_1 $.

Whenever $ a,0\models\varphi $ holds, we say that $ a $ is a model of $ \varphi $, or that $ a $ satisfies $ \varphi $. We also write $ a\models\varphi $ as an abbreviation for $ a,0\models\varphi $.

A deterministic finite automaton (dfa) is a tuple $ {\cal{A}} = \langle {\cal{P}}, Q, q_0, F, \delta\rangle $ where: (i) $ {\cal{P}} $ is the alphabet, (ii) $ Q $ is a finite set of states, (iii) $ q_0 $ is the initial state, (iv) $ F\subseteq Q $ is the set of accepting states, and (v) $ \delta:Q\times {\cal{P}} \to Q $ is a total transition function. Note that the DFA alphabet is the same as the set of traces that satisfy the formula $ \varphi $. A trace $ a $ is accepted by a $ {\rm{DFA}}$ $ {\cal{A}} $ iff a sequence of states $ q_0,\dots,q_l $ exists in $ Q $ such that: (i) $ q_{i+1}\in\delta(q_i,a_i) $ for $ i = 0,\dots,l-1 $; and (ii) $ q_l\in F $. We denote with $ {\cal{L}}( {\cal{A}}) $ the set of traces accepted by $ {\cal{A}} $. An ${\rm{LTL}} _f $ formula can be transformed into an equivalent DFA $ {\cal{A}}_\varphi $ in at most 2EXPTIME^[11].

2.2. Markov Decision Process (MDP)

A Markov Decision Process (MDP) is a tuple M = (S, A, P, s₀), where: (i) S is a finite set of states, (ii) s₀ is the initial state, (iii) A is a finite set of actions, and (iv) $ P : (S \times A) \to \Delta(S) $ is the transition probability function, i.e. a mapping from state-action pairs to probability distributions over S. With Supp(d) , we denote the support of a probability distribution $ d $. An infinite path $ \rho \in (S \times A)^\omega $ is a sequence $ \rho = s_0a_1s_1a_2 \dots $, where $ s_i \in S $, $ a_{i+1} \in A $, and $ s_{i+1}\in {\text{Supp}}(P(s_i,a_{i+1})) $, for all $ i \in {\mathbb{N}} $. Similarly, a finite path $ \rho \in (S \times A)^* \times S $ is a finite sequence $ \rho = s_0a_1s_1a_2 \dots a_{m}s_m $. For any path $ \rho $ of length at least $ j $ and any $ i \leq j $, we let $ \rho[i: j] $ denote the subsequence $ s_ia_{i+1}s_{i+1}a_{i+2} \dots a_{j}s_j $. Moreover, let $ {\text{last}}(\rho) = s_m $ to denote the last state visited by the finite path $ \rho $. We use $ {\text{Paths}}_ {\cal{M}}^\omega = (S \times A)^\omega $ and $ {\text{Paths}}_ {\cal{M}} = (S \times A)^* \times S $ to denote the set of infinite and finite paths, respectively. A policy $ \pi : {\text{Paths}}_ {\cal{M}} \rightarrow A $ maps a finite path $ \rho \in {\text{Paths}}_ {\cal{M}} $ to an action $ a\in A $. We denote with $ {\text{Paths}}_{ {\cal{M}}_\pi} $ the set of finite paths over $ {\cal{M}} $ whose actions are compatible with $ \pi $. Given a finite path $ \rho = s_0a_1 \dots a_ms_m $, the cylinder of $ \rho $, denoted by $ {\text{Paths}}_{ {\cal{M}}}^\omega(\rho) $, is the set of all infinite paths starting with prefix $ \rho $. The $\sigma$-algebra associated with MDP $ {\cal{M}} $ and a fixed policy $ \pi $ is the smallest $\sigma$-algebra that contains the cylinder sets $ {\text{Paths}}_{ {\cal{M}}_\pi}^\omega(\rho)$ for all $\rho \in {\text{Paths}}_{ {\cal{M}}_\pi}$. For a state $s$ in $S$, a measure is defined for the cylinder sets as $ {\mathbb{P}}_{ {\cal{M}}_\pi,s}( {\text{Paths}}^\omega_{ {\cal{M}}_\pi}(s_0 a_1 s_1 \dots a_m s_m)) = \prod_{k = 0}^{m-1} P(s_{k+1}|s_k,a_{k+1}) $ if $ s_0 = s $ and for all $ k $, $ a_{k+1} = \pi(\rho[0: k]) $, otherwise $ 0 $. We also have $ {\mathbb{P}}_{ {\cal{M}}_\pi,s}( {\text{Paths}}^\omega_{ {\cal{M}}_\pi}(s)) = 1$ and $ {\mathbb{P}}_{ {\cal{M}}_\pi,s}( {\text{Paths}}^\omega_{ {\cal{M}}_\pi}(s')) = 0$ for $s'\neq s$. This can be extended to a unique probability measure $ {\mathbb{P}}_{ {\cal{M}}_\pi,s}$ on the aforementioned $\sigma$-algebra. In particular, if $ {\cal{R}} \subseteq {\text{Paths}}_{ {\cal{M}}_\pi}$ is a set of finite paths forming pairwise disjoint cylinder sets, then $ {\mathbb{P}}_{ {\cal{M}}_\pi,s}(\bigcup_{\rho \in {\cal{R}}} {\text{Paths}}^\omega_{ {\cal{M}}_\pi}(\rho)) = \sum_{\rho \in {\cal{R}}} {\mathbb{P}}_{ {\cal{M}}_\pi,s}( {\text{Paths}}^\omega_{ {\cal{M}}_\pi}(\rho)) $. We denote with $ {{\mathbb{E}}}_{ {\cal{M}}_{\pi},s}[X] $ the expected value of a random variable $ X $ with respect to the distribution $ {\mathbb{P}}_{ {\cal{M}}_{\pi},s} $.

3. Stochastic service composition framework

4. Composition of stochastic services for reachability ʟᴛʟ_f tasks

In this section, we present our service composition framework in stochastic settings where the goal is to maximize the probability of satisfying the ʟᴛʟ_f reachability task of $ \varphi $, where $ \varphi $ is an ʟᴛʟ_f formula. Intuitively, we are interested in orchestrators that maximize the probability of satisfaction of the ʟᴛʟ_f specification, even when the specification cannot be surely satisfied (e.g., due to stochastic misbehavior of some service). Moreover, while guaranteeing the optimal probability of satisfaction, we aim to find those orchestrators that minimize the expected utilization cost of the services, conditioned on the achievement of the task.

Formally, we consider a ʟᴛʟ_f formula $ \varphi $ expressed over the set of actions $ A $, and consider a community of $ n $ services $ {\cal{C}} = \{ {\cal{S}}_1, \dots, {\cal{S}}_n\} $, where each set of actions $ A_i\subseteq A $. We say that some finite execution $ h $ is successful (for $ {\cal{C}} $ and $ \varphi $), denoted with $ {\text{successful}}_{ {\cal{C}},\varphi}(h) $, if there exist some execution $ h'\in {\text{prefixes}}(h) $ such that the following two conditions hold: (1) $ {\text{actions}}(h')\models\varphi $ and (2) all service states $ \sigma_i \in {\text{last}}( {\text{states}}(h')) $ are such that $ \sigma_i\in F_i $. If for execution $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}} $ there exist a finite prefix history $ h\in {\text{prefixes}}(t) $ such that $ {\text{successful}}_{ {\cal{C}},\varphi}(h) $ for $ \varphi $, we say that $ t $ is successful for $ \varphi $. Finally, we say that an orchestrator $ {\gamma} $ realizes the ${\rm{LTL}} _f $ reachability task $ \varphi $ with $ {\cal{C}} $ if, for all traces $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}} $, $ t $ is successful with respect to $ \varphi $. When $ {\cal{C}} $ and $ \varphi $ are clear from the context, we omit the suffix in $ {\text{successful}}_{ {\cal{C}},\varphi} $ and only write $ {\text{successful}} $.

Example 2 Continuing example 1, we consider the following sequential task: $ {clean} $ the garden by picking fallen leaves and removing dirt, $ {water} $ the plants, and $ {pluck} $ the ripe fruits and flowers. The action $ {clean} $ must be performed at least once, followed by $ {water} $ and $ {pluck} $ in any order. In ${\rm{LTL}} _f $, the task can be expressed as

$ \varphi = {clean} \wedge {\bigcirc}({clean} \;{\cal{U}}\;(({water}\land {\bigcirc}{pluck})\lor({pluck}\land {\bigcirc}{water}))) $

A successful execution is

$ \begin{array}{l} h_1 = \langle (a_0,b_0,c_0), ({clean}, {\cal{B}}_1), (a_0, b_0, c_0), ({water}, {\cal{B}}_2), (a_0, b_0, c_0),\\ \;\;\;\;\;\;\;\; ({pluck}, {\cal{B}}_2), (a_0, b_1, c_0),({empty}, {\cal{B}}_2), (a_0, b_0, c_0)\rangle \end{array} $

Note that, before the action $ ''{empty}'' $ is taken, the ʟᴛʟ_f task $ \varphi $ is already satisfied; however, the garden bot $ {\cal{B}}_2 $ is not in an accepting state. The last action in the execution, $ ''{empty}'' $, is necessary to move the bot $ {\cal{B}}_2 $ to its accepting state. Another successful execution is

$ \begin{array}{l} h_2 = \langle (a_0,b_0,c_0), ({clean}, {\cal{B}}_1), (a_0, b_0, c_0), ({pluck}, {\cal{B}}_3), (a_0, b_0, c_1), \\ \;\;\;\;\;\;\;\;\;({empty}, {\cal{B}}_3), (a_0, b_0, c_0), ({water}, {\cal{B}}_2), (a_0, b_0, c_0)\rangle \end{array} $

On the other hand, any extension of the execution

$ h_3 = \langle(a_0,b_0,c_0), ({clean}, {\cal{B}}_1), (a_0, b_0, c_0), ({pluck}, {\cal{B}}_3), (a_0, b_0, c_2)\rangle $

will not be successful since service $ {\cal{B}}_3 $ can never reach one of its accepting states from $ c_2 $.

The satisfaction probability of the reachability task $ \varphi $ under orchestrator $ {\gamma} $ and community $ {\cal{C}} $ is given by:

$ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}) = {\mathbb{P}}_{ {\gamma}, {\cal{C}}}\bigg(\big\{t \in {\cal{T}}_{ {\gamma}, {\cal{C}}} \,\big\vert\, {\text{successful}}(t)\big\}\bigg) $

(2)

Intuitively, $ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}) $ is the probability measure (under $ {\mathbb{P}}_{ {\gamma}, {\cal{C}}} $) of the set of successful executions generated by executing the orchestrator $ {\gamma} $ over the service community $ {\cal{C}} $.

Moreover, we define the (conditioned) expected utilization cost of services as the expected cost an orchestrator incurs in its successful executions, i.e.:

$ {\cal{J}}^{ {{\text{cost}}}}_{ {\cal{C}},\varphi} ( {\gamma}) = {\mathbb{E}}_{h\sim {\mathbb{P}}_{ {\gamma}, {\cal{C}}}}\Bigg[ \sum\limits_{k = 1}^{|h|} C_{o_k}(\sigma_{o_k,k-1},a_{k}) \bigg\vert\ {\text{successful}}(h) \Bigg] $

(3)

Let $ \Gamma( {\cal{C}}) $ be the set of orchestrators for the community $ {\cal{C}} $. Let $ f: \Gamma( {\cal{C}}) \to {\mathbb{R}} $ be an objective function. We say an orchestrator $ {\gamma}\in\Gamma( {\cal{C}}) $ is $ f $-optimal if $ f( {\gamma}) = \sup_{\tau\in \Gamma( {\cal{C}})} f(\tau) $, and write $ \Gamma_f $ for the set of $ f $-optimal orchestrators.

Finally, we define our optimization problem. We want to compute an orchestrator $ {\gamma} $ such that the following holds:

$ {\gamma}\in\Gamma_{ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}} \text{ and } {\cal{J}}^{ {{\text{cost}}}}_{ {\cal{C}},\varphi}( {\gamma}) = \inf\limits_{ {\gamma}'\in\Gamma_{ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}}} {\cal{J}}^{ {{\text{cost}}}}_{ {\cal{C}},\varphi}( {\gamma}') $

(4)

Intuitively, we fix a lexicographic order on the objective functions $ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}} $ and $ {\cal{J}}^{ {{\text{cost}}}}_{ {\cal{C}},\varphi} $, meaning that we aim to minimize the expected utilization cost to satisfy the specification, conditioned to the satisfaction of the specification, while guaranteeing the optimal probability of satisfying it. Interestingly, in case the specification is exactly realizable, the notion of optimal orchestrator according to Eq. (4) coincides with the notion of realizability, as shown in the following results.

First, let $ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $ be the set of finite executions $ h $ of orchestrator $ {\gamma} $ on service community $ {\cal{C}} $ that start from $ \sigma_{10}\dots\sigma_{n0} $ such that (i) $ {\text{successful}}(h) $ holds, and (ii) there is no prefix history $ h'\in {\text{prefixes}}(h) $ such that $ {\text{successful}}(h') $ holds:

$\begin{split} {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} =\;& \{ h\in {\cal{H}}_{ {\gamma}, {\cal{C}}}: {\text{successful}}(h) \wedge \not\exists h'\in {\text{prefixes}}(h) \text{ s.t. } h'\\\neq\;& h \wedge {\text{successful}}(h) \}\end{split} $

(5)

Intuitively, such a set only contains the finite executions $ h $ such that they are successful for $ \varphi $ for the first time. In fact, by definition of $ {\cal{H}}^\varphi_{ {\gamma}, {\cal{C}}} $, if $ h\in {\cal{H}}^\varphi_{ {\gamma}, {\cal{C}}} $ then $ {\text{successful}}(h) $ holds.

Lemma 1 The following equality holds:

$ \big\{t \in {\cal{T}}_{ {\gamma}, {\cal{C}}} \,\big\vert\, {\text{successful}}(t)\big\} = \bigcup\limits_{h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) $

Proof We prove the two set inclusions (i) $ \subseteq $ and (ii) $ \supseteq $ separately.

(i) if $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}} $ and $ {\text{successful}}(t) $, then there exist a prefix $ h\in {\text{prefixes}}(t) $ such that $ h $ is successful. Let $ h' $ be the shortest of such prefixes: this means that no prefix $ h" $ of $ h' $ is successful. From this, by definition of $ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $, it follows that $ h'\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $. Then, given that $ t $ is an execution of $ {\gamma} $ over $ {\cal{C}} $ with prefix $ h' $, by definition of $ {\cal{T}}_{ {\gamma}, {\cal{C}}} $, we have that $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}}(h') $, and therefore the thesis holds.

(ii) let $ t\in\bigcup_{h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) $. Let $ h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $ be the history such that $ t\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $, which must exist by assumption. By definition of $ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $, we have $ {\text{successful}}(h) $, and by definition of $ {\text{successful}} $, since $ h\in {\text{prefixes}}(t) $, we also have $ {\text{successful}}(t) $. Moreover, $ {\cal{T}}_{ {\gamma}, {\cal{C}}}(h)\subseteq {\cal{T}}_{ {\gamma}, {\cal{C}}} $, which concludes the proof. □

It is crucial to observe that since by definition of $ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $ there is no pair $ h',h"\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}(h) $ such that $ h'\in {\text{prefixes}}(h") $, all trace sets $ {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) $ for $ h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $ are pairwise disjoint sets, which means that $ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}) $ is a well-defined probability.

Lemma 2 If $ {\gamma} $ realizes the reachability task $ \varphi $ over $ {\cal{C}} $, then $ \bigcup_{h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) = {\cal{T}}_{ {\gamma}, {\cal{C}}}(\langle(\sigma_{10}\dots\sigma_{n0})\rangle) $.

Proof We prove (i) $ \bigcup_{h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) \subseteq {\cal{T}}_{ {\gamma}, {\cal{C}}}(\langle(\sigma_{10}\dots\sigma_{n0})\rangle) $ and (ii) $ \bigcup_{h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) \supseteq {\cal{T}}_{ {\gamma}, {\cal{C}}}(\langle(\sigma_{10}\dots\sigma_{n0})\rangle) $ separately. Proposition (i) is immediate: every execution belongs to the set of executions starting from $ \sigma_{10}\dots\sigma_{n0} $. To prove proposition (ii), we start by observing that for all $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}}(\langle(\sigma_{10}\dots\sigma_{n0})\rangle) $, by definition of realizing orchestrator, they have a prefix $ h'\in {\text{prefixes}}(h) $ that is successful. In particular, if $ h" $ is the shortest prefix of $ h $ that is successful, then $ h"\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $ and $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}}(h") $. This implies that $ t\in\bigcup_{h"\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h") $.□

Theorem 1 Let $ {\cal{C}} $ be a community of stochastic services, and $ \varphi $ be a reachability task. The orchestrator $ {\gamma} $ realizes $ \varphi $ with community $ {\cal{C}} $ iff $ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}) = 1 $.

Proof ($ \Rightarrow $) If an orchestrator $ {\gamma} $ realizes $ \varphi $, then all infinite executions $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}} $ are successful. By Lemma 1, this implies that $ t\in \bigcup_{h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) $. By Lemma 2, this set is equal to $ {\cal{T}}_{ {\gamma}, {\cal{C}}}(\langle(\sigma_{10}\dots \sigma_{n0})\rangle) $. Since by definition $ {\text{Supp}}( {\mathbb{P}}_{ {\gamma}, {\cal{C}}}) \subseteq $ $ {\cal{T}}_{ {\gamma}, {\cal{C}}}(\langle(\sigma_{10}\dots\sigma_{n0})\rangle) $, we have that $ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}) = {\mathbb{P}}_{ {\gamma}, {\cal{C}}}(\bigcup_{h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h)) = {\mathbb{P}}_{ {\gamma}, {\cal{C}}}( {\cal{T}}_{ {\gamma}, {\cal{C}}}(\langle(\sigma_{10}\dots \sigma_{n0})\rangle)) = 1 $.

($ \Leftarrow $) Assume an orchestrator $ {\gamma} $ is such that $ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}) = 1 $. This implies that for all infinite executions $ t\in {\text{Supp}}( {\mathbb{P}}_{ {\gamma}, {\cal{C}}}) \subseteq $ ${\cal{T}}_{ {\gamma}, {\cal{C}}}(\langle(\sigma_{10}\dots\sigma_{n0})\rangle) $ of orchestrator $ \gamma $, there is a prefix $ h\in {\text{prefixes}}(t) $ such that $ h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $ and $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) $. This means that t is successful, and therefore, all executions are successful, i.e., the definition of realizability. □

Theorem 2 Assume the reachability task $ \varphi $ is realizable. If an orchestrator $ {\gamma} $ satisfies Eq. (4), then it realizes the reachability task $ \varphi $.

Proof Since by assumption the reachability task $ \varphi $ is realizable, then there exists an orchestrator $ {\gamma}' $ that realizes it. By Theorem 1, we can deduce that the optimal value of $ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}') $ is $ 1 $. Moreover, by assumption and by Eq. (4), it follows that $ {\gamma} \in \Gamma_{ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}} $, i.e. $ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}) = 1 $, by the arguments above. Finally, again by Theorem 1, we get that $ {\gamma} $ realizes $ \varphi $. □

Finally, we formally state the stochastic version of our problem:

Problem 1 (Stochastic Composition for ʟᴛʟ_f Reachability Tasks). Given the pair $ ( {\cal{C}}, \varphi) $, where $ \varphi $ is an ʟᴛʟ_f reachability task over the set of actions $ A $, and $ {\cal{C}} $ is a community of $ n $ stochastic services $ {\cal{C}} = \{ {\cal{S}}_1,\dots, {\cal{S}}_n\} $, compute an orchestrator that is optimal according to Eq. (4).

Interestingly, Theorems 1 and 2 show that one can find an orchestrator even in a non-stochastic setting by considering arbitrary services' probability distributions for $ P_i(\sigma_i,a) $, for any pair $ \sigma_i $ and $ a $, whose support is compatible with $ \delta_i $, and then check whether $ \max_ {\gamma} {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}) = 1 $.

Example 3 Continuing example 2, we aim to find a composition of the available bots $ {\cal{B}}_1, {\cal{B}}_2, {\cal{B}}_3 $ so as to maximize the probability of the satisfaction of the reachability task $ \varphi $, which also considers rewards/costs. The $ {clean} $ action can only be delegated to $ {\cal{B}}_1 $, and the optimal solution must take into account its stochastic behaviour to correctly compute the expected cost. Regarding the $ {pluck} $ action, both $ {\cal{B}}_2 $ and $ {\cal{B}}_3 $ can perform it. However, the optimal orchestrator will not dispatch it to $ {\cal{B}}_3 $ because, despite the cost being smaller than the one in $ {\cal{B}}_2 $, choosing $ {\cal{B}}_3 $ will lead to a probability of $ 0.1 $ of not reaching the final state configuration since the state $ c_2 $ is a failure state while choosing $ {\cal{B}}_3 $ does not compromise the optimal probability of goal satisfaction.

4.1. Solution technique

The solution technique is based on finding an optimal policy for a bi-objective lexicographic optimization on a specifically built MDP. In particular, we consider a variant of the framework introduced in study by Busatto-Gaston et al.^[35]: while as the second objective they considered the expected number of steps to a target, here we consider the expected cost. Given an instance of the reachability composition problem $ ( {\cal{C}},\varphi) $, our technique breaks down into the following steps: (1) first, we translate the ${\rm{LTL}} _f $ formula into the equivalent DFA $ {\cal{A}}_\varphi $; then (2) we combine $ {\cal{A}}_\varphi $ with the services, computing a product of $ {\cal{A}}_{\varphi} $ with the stochastic services in $ {\cal{C}} $, obtaining a new MDP, $ {\cal{M}}' $, that we call the 'composition MDP'; (3) we solve the resulting MDP, finding a policy $ \pi $ for $ {\cal{M}}' $ that is optimal w.r.t. the bi-objective lexicographic function, as in study by Busatto-Gaston et al.^[35], and then (4) we derive an orchestrator $ {\gamma} $ from $ \pi $ that is optimal w.r.t. Eq. (4). We now detail each step.

Step 1. The dfa of an ʟᴛʟ_f formula can be computed by exploiting a well-known correspondence between ʟᴛʟ_f formulas and automata on finite words^[11]. That is, we can compute a dfa $ {\cal{A}}_\varphi = (A, Q, q_0, F, \delta) $ that is equivalent to the specification $ \varphi $, which can be double-exponentially larger than the size of the formula.

Step 2. Next, we define the Composition MDP $ {\cal{M}}' = (S', A', P', s'_0) $ as follows:

● $ S' = Q\times\Sigma_1\times\cdots\times \Sigma_n $;

● $ A' = A\times \{1\dots n\} $;

● $ s'_0 = (q_0,\sigma_{10}\dots\sigma_{n0}) $;

● $ P'(q',\sigma'_1\dots\sigma'_i\dots\sigma'_n|q,\sigma_1\dots \sigma_i\dots \sigma_n, (a,i)) = P_i(\sigma'_i|\sigma_i, a) $ if $ \delta(q,a) = q' $ and $ \sigma'_{j} = \sigma_j $ for all $ j\neq i $, otherwise $ 0 $.

Moreover, let the composition cost function $ C': S'\times A' \to {\mathbb{R}}^+ $ be defined as $ C'((q,\sigma_1\dots\sigma_i\dots\sigma_n), (a,i)) = C_i(\sigma_i, a) $.

We are interested in computing optimal policies for $ {\cal{M}}' $, where the optimality is defined as follows. As the set of target states $ T $, we consider:

$ T = \{(q,\sigma_1,\dots,\sigma_n) \mid (q,\sigma_1,\dots,\sigma_n)\in S' \wedge q\in F \wedge \forall i = 1,\dots,n\,: \sigma_i \in F_i\} $

Intuitively, $ T\subseteq S' $ is the set of states of $ {\cal{M}}' $ where the state component from $ {\cal{A}} $, $ q $, is accepting, and the state components from the services $ {\cal{S}}_i $, $ \sigma_i $, are accepting in $ {\cal{S}}_i $. We consider the bi-objective lexicographic optimization over $ {\cal{M}}' $, similarly to what has been done in study by Busatto-Gaston et al.^[35]. In particular, we first consider the probability of reaching a set of target states $ T $ from $ s\in S' $, following a policy $ \pi $ over the MDP $ {\cal{M}}' $, denoted with $ {\mathbb{P}}_{ {\cal{M}}'_\pi,s}(\lozenge T) $; with $ \Pi_{ {\cal{M}}',s}(\lozenge T) $, we denote the set of policies with the maximum probability of reaching $ T $, i.e. $ \arg\max_\pi {\mathbb{P}}_{ {\cal{M}}'_\pi,s}(\lozenge T) $. Then, we consider the cost of the shortest prefix of $ \rho $ that reaches one of the target states in $ T $, i.e. $ {\text{cost}}_{T}(\rho) = \sum_{k = 0}^{i} C'(s'_k, a'_k) $ if $ \rho[i]\in T $ and for all $ j \lt i $, $ \rho[j] \not\in T $. An optimal solution for $ {\cal{M}}' $ is a policy $ \pi $ that minimizes the conditional expected cost of reaching a target state $ {\mathbb{E}}_{\rho\sim {\cal{M}}'_{\pi'},s'_0}[ {\text{cost}}_{T}(\rho) | \lozenge T] $ among the policies in $ \Pi_{ {\cal{M}}',s'_0}(\lozenge T) $, that is, the policies which maximize $ {\mathbb{P}}_{ {\cal{M}}'_\pi,s'_0}(\lozenge T) $, i.e.:

$ \pi\in \Pi_{ {\cal{M}}',s_0}(\lozenge T) \text{ and } \pi\in\arg\min\limits_{\pi'} {\mathbb{E}}_{\rho\sim {\cal{M}}'_\pi,s'_0}\big[ {\text{cost}}_T(\rho) | \lozenge T\big] $

(6)

Step 3. The solution technique we will use is based on the work^[35], where the authors propose a two-stage technique to find an optimal policy for a bi-objective lexicographic function in the form of Eq. (6). First, we compute the set of policies (in the form of a set of optimal actions for each state) that maximize the probability of reaching the target states; however, this set of policies also contains the 'deferral' policies, i.e. policies that defer the actual reaching of the target states indefinitely, but in such a way that the target can still be reached with maximum probability at any moment. Then, we consider a 'pruned MDP' in which (i) only optimal action can be taken, and (ii) only states from which the target can be reached are kept. The new MDP is used to find policies that minimize the expected cost of reaching the target. By construction, the optimal policy of the pruned MDP guarantees the target is always reached since any deferral policy will incur an infinite cost. The difference between our scenario and Busatto-Gaston et al.^[35] is that they consider the length of the path, rather than its cost, as the second objective function. Nevertheless, it is easy to see that their approach works if, instead of considering the expected length of successful paths, we consider their expected total costs (i.e., minimizing path length can be seen as minimizing costs with each transition having unitary cost). Note that the techniques used to find the solutions are standard: the first stage requires solving the maximal reachability probability problem^[38] on the composition MDP with the accepting end components as the set of states $ T $. The second stage requires solving a stochastic shortest path problem^[24] on the pruned MDP. The two subproblems can be solved efficiently using standard planning algorithms, e.g., Value Iteration or Linear Programming.

Step 4. Once an optimal policy is found, we can obtain its equivalent $ {\gamma} $ as follows: for any finite prefix of a run $ \rho = (q_0,\sigma_{10}\dots\sigma_{n0}), (a_1,o_1),\dots(a_m,o_m), (q_m,\sigma_{1m}\dots\sigma_{nm}),\dots $, we set $ {\gamma}((\sigma_{10}\dots\sigma_{n0}) \dots(\sigma_{1m}\dots\sigma_{nm})) = (a_{m+1},o_{m+1}) $, where $ \pi(\rho) = (a_{m+1}, o_{m+1}) $.

Now we aim to establish a relationship between optimal orchestrators according to Eq. (4), and optimal policies for $ {\cal{M}}' $ according to Eq. (6). Given an infinite run $ \rho = (q_0,\sigma_{10}\dots\sigma_{n0}),(a_1,o_1)\dots $, we define the trace $ t = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho) = (\sigma_{10}\dots\sigma_{n0}),(a_1,o_1),\dots $. The definition easily applies to finite prefixes of $ \rho $ but this time mapping into histories of $ t $.

In the following, we are going to prove a sequence of lemmata. Lemma 3 shows that, once fixed a policy $ \pi $ over $ {\cal{M}}' $, there is a one-to-one correspondence between paths $ \rho $ in $ {\cal{M}}' $ following $ \pi $ and the executions $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}} $ of the equivalent orchestrator of $ \pi $, $ {\gamma} $; Lemma 4 shows that the probabilities of finite paths and histories are the same; Lemma 5 shows that paths that end with states in $ T $ correspond to successful histories; and Lemma 6 shows a correspondence between paths in $ {\text{Paths}}_{T, {\cal{M}}'_\pi} $ and $ {\cal{H}}_{ {\gamma}, {\cal{C}}}^\varphi $.

Lemma 3 Let $ \pi $ be a policy for $ {\cal{M}}' $ and let $ {\gamma} $ be its equivalent orchestrator. Moreover, let $ \rho\in {\text{Paths}}_{{ {\cal{M}}'}_\pi}^\omega $ and $ t $ be a trace such that $ t = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho) $. Then, $ \rho\in {\text{Paths}}_{ {\cal{M}}'_\pi}^\omega(\langle s'_0 \rangle) $ iff $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}}(\langle (\sigma_{10}\dots\sigma_{n0})\rangle) $.

Proof Let the infinite path $ \rho\in {\text{Paths}}_{ {\cal{M}}'_{\pi}}^\omega(\langle s'_0 \rangle) $, and the infinite trace $ t = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho)) $. We prove the claim by induction on the position of the run/trace.

Base case: we have the claim holds for position $ 0 $ because $ \rho[0] = (q_0,\sigma_{10}\dots\sigma_{n0}) $, and $ h[0] = \sigma_{10}\dots\sigma_{n0} $. Therefore, $ \langle h[0]\rangle $ satisfies the conditions of the definitions of history and execution of $ {\gamma} $ iff $ s'_0\in S' $.

Inductive case: assume the claim holds up to position $ k\ge 0 $.

($ \Rightarrow $) Consider the $ (k+1) $-th action according to $ \pi $, i.e. $ \pi(\rho[0: k]) = (a_{k+1}, o_{k+1}) $, and its successor state $ \rho[k+1] = (q_{k+1}, \sigma_{1,k+1}, \dots, \sigma_{n,k+1}) $. By inductive hypothesis, we have that $ h = t[0:k] $ is a valid execution under orchestrator $ {\gamma} $, where $ t = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho) $. We need to show that $ h' = t[0:k],(a_{k+1}, o_{k+1}), (\sigma_{1,k+1},\dots,\sigma_{n,k+1}) $ is a valid extension of $ h $ under orchestrator $ {\gamma} $. By construction of $ {\cal{M}}' $, we have that (i) $ \sigma_{i,k+1}\in \Sigma_i $ for all services, (ii) $ o_{k+1}\in\{1\dots n\} $, (iii) $ a_{k+1}\in A $. We also aim to show that (iv) for all $ j = 1,\dots,n $, $ \sigma_{j,k+1}\in {\text{Supp}}(P_{j}(\sigma_{j,k},a_{k+1})) $ if $ j = o_{k+1} $; otherwise $ \sigma_{j,k+1} = \sigma_{j,k} $. Proposition (iv) holds because of the definition of $ P' $ and by the hypothesis of $ \rho $ being a path of $ {\cal{M}}' $ (the support of the transition probability function of all the states along a path must be non-empty). The propositions (i), (ii), (iii), (iv) are precisely the conditions that each step of a valid execution must satisfy. Moreover, by construction of $ {\gamma} $, $ (a_{k+1}, o_{k+1}) = $ ${\gamma}( {\text{states}}(t[0: k])) $, hence $ h' = t[0: k], (a_{k+1},o_{k+1}),(\sigma_{1,k+1}\dots\sigma_{n,k+1}) $ is a proper (finite) execution under orchestrator $ {\gamma} $.

($ \Leftarrow $) Now, consider a finite execution under orchestrator $ {\gamma} $, $ h = t[0:k] $, and its extension $ h' = t[0:k],(a_{k+1},o_{k+1}),(\sigma_{1,k+1}\dots\sigma_{n,k+1}) $. We need to show that the finite path prefix $ \rho[0:k+1] $ is a valid finite path in $ {\cal{M}}' $ under policy $ \pi $. Since by inductive hypothesis the claim holds up to the $ k $-th step, we only need to show that the $ (k+1) $-th step defined as $ (a_{k+1}, o_{k+1}),(q_{k+1}, \sigma_{1,k+1}\dots\sigma_{n,k+1}) $, for some $ q_{k+1} $, is a valid extension of $ \rho[0:k] $ under policy $ \pi $. By construction, we have that $ \pi(\rho[0:k]) = {\gamma}( {\text{states}}(h')) = (a_{k+1}, o_{k+1}) $. Regarding the successor state, we have to show that $ (q_{k+1}, \sigma_{1,k+1}\dots\sigma_{n,k+1})\in {\text{Supp}}(P') $. But this follows from our choice of $ q_{k+1} = \delta(q_k,a_{k+1}) $ and by the condition (iv) of a valid execution step.

By induction, the claim also holds for any arbitrary position, and therefore $ \rho\in {\text{Paths}}_{ {\cal{M}}'_\pi}^\omega(\langle s'_0 \rangle) $ iff $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}}(\langle (\sigma_{10}\dots\sigma_{n0})\rangle) $. □

Lemma 4 Let $ \pi $ a policy on $ {\cal{M}}' $, $ {\gamma} $ be its equivalent orchestrator, $ \rho = s'_0a_1\dots s'_m\in {\text{Paths}}_{ {\cal{M}}'_\pi}(s'_0) $ be a finite path on $ {\cal{M}}' $, and $ h = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho) $ be its associated history. Then, $ {\mathbb{P}}_{ {\cal{M}}'_\pi,s'_0}( {\text{Paths}}_{ {\cal{M}}'_\pi}^\omega(\rho)) = {\mathbb{P}}_{ {\gamma}, {\cal{C}}}( {\cal{T}}_{ {\gamma}, {\cal{C}}}(h)) $.

Proof

$ {\mathbb{P}}_{ {\cal{M}}'_\pi,s'_0}( {\text{Paths}}_{ {\cal{M}}'_\pi}^\omega(\rho)) = \prod\limits_{k = 1}^{m} P'(s'_{k} \mid s'_{k-1},(a_k,o_k)) $

(7)

$ = \prod\limits_{k = 1}^m P_{o_k}(\sigma_{o_k,k} \mid \sigma_{o_k,k-1}, (a_k,o_k)) $

(8)

$ = {\mathbb{P}}_{ {\gamma}, {\cal{C}}}( {\cal{T}}_{ {\gamma}, {\cal{C}}}(h)) $

(9)

where step 7 is by definition of the probability of a cylinder set, step 8 by definition of $ P' $ in $ {\cal{M}}' $, and step 9 by Eq. (1). □

Lemma 5 Let $ \rho = s_0a_1\dots s_m\in {\text{Paths}}_{ {\cal{M}}'_\pi} $ be a finite path on $ {\cal{M}}'_\pi $ for some policy $ \pi $, and let $ h = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho) $ be its associated history. Then, $ {\text{successful}}(h) $ iff there exists $ k\in [0,m] $ such that $ s_k\in T $.

Proof Let $ k $ be an integer between $ 0 $ and $ m $. By construction of $ {\cal{M}}' $, we have that $ s_k = (q_k,\sigma_{1k}\dots\sigma_{nk})\in T $ iff $ (a) $ $ q_k\in F $ and $ (b) $ for all $ i = 1,\dots,n $ we have $ \sigma_{ik}\in F_i $. The proposition (a) means that the sequence of states $ r = q_0,\dots,q_k $ is an accepting run over $ {\cal{A}}_\varphi $ for trace $ {\text{actions}}(h') = a_1,\dots a_{k} $, where $ h' = h[0:k] $. By definition of acceptance, the above holds iff $ {\text{actions}}(h')\in {\cal{L}}( {\cal{A}}_\varphi) $. By the correctness of the construction of $ {\cal{A}}_\varphi $, this is true iff $ {\text{actions}}(h')\models \varphi $. Finally, since $ h'\in {\text{prefixes}}(h) $, we have $ {\text{actions}}(h')\models\varphi $ and (b) both imply that $ {\text{successful}}(h) $ holds.

Conversely, assume $ {\text{successful}}(h) $. Then by definition there exists a prefix $ h' = h[0:k] $ such that $ {\text{actions}}(h')\models\varphi $ and such that (c) $ \sigma_{ik}\in F_i $ for all $ i = 1,\dots,n $. By the correctness of the construction of $ {\cal{A}}_\varphi $, $ {\text{actions}}(h')\models\varphi $ iff $ {\text{actions}}(h')\in {\cal{L}}( {\cal{A}}_\varphi) $, meaning that there exists an accepting run $ r = q_0,\dots,q_k $ over $ {\cal{A}}_\varphi $, where (d) $ q_k\in F $. Conditions (c) and (d) imply that $ \rho[k] = s_k = (q_k,\sigma_{1k}\dots\sigma_{nk})\in T $, by definition of the set $ T $.□

Let $ {\text{Paths}}_{T, {\cal{M}}'_\pi}(s'_0) $ be the set of finite paths following $ \pi $ on $ {\cal{M}}' $ such that they start with $ s'_0 $ and enter in a state in $ T $ only at the end of the path and for the first time, i.e. $ {\text{Paths}}_{T, {\cal{M}}'_\pi}(s'_0) = ((S' \setminus T ) \times A)^*T \cap {\text{Paths}}_{ {\cal{M}}'_\pi}(s'_0) $.

Lemma 6 $ \rho\in {\text{Paths}}_{T, {\cal{M}}'_\pi}(s'_0) $ iff $ {\tilde \tau}_{\varphi, {\cal{C}}}(\rho)\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $

Proof By Lemma 5, $ \rho\in {\text{Paths}}_{T, {\cal{M}}'_\pi} $ iff $ h = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho) $ is successful, since $ \rho $ ends in a state in $ T $. Moreover, since $ {\text{Paths}}_{T, {\cal{M}}'_\pi}\subseteq {\text{Paths}}_{ {\cal{M}}'_\pi} $, by Lemma 3, we have that $ \rho\in {\text{Paths}}_{T, {\cal{M}}'_\pi} $ iff $ h $ is an execution of $ {\gamma} $. Furthermore, by assumption, any finite prefix $ \rho' $, say of length $ m $, of $ \rho $, is such that $ \rho'[m] \not\in T $. Then, again by Lemma 5, this holds iff $ h' = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho') $ is not successful, meaning that does not exist a prefix $ h'\in {\text{prefixes}}(h) $ with $ h'\neq h $ such that $ h' $ is successful. But this is precisely the membership condition for $ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $.□

This result shows the correctness of our technique:

Theorem 3 Let $ ( {\cal{C}}, \varphi) $ be an instance of Problem 1, and let $ {\cal{M}}' $ be the composition MDP for $ {\cal{C}} $ and $ \varphi $. We have that $ \pi $ is optimal (w.r.t. Eq. [6]) if its equivalent orchestrator $ {\gamma} $ is optimal (w.r.t. Eq. [4]).

Proof First, we show that $ \pi = \arg\max_{\pi'} {\mathbb{P}}_{ {\cal{M}}_\pi',s'_0}(\lozenge T) $ iff $ {\gamma} = \arg\max_{ {\gamma}'} {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}') $. For any pair $ \pi $ and its equivalent $ {\gamma} $, we have:

$ {\mathbb{P}}_{\pi,s'_0}(\lozenge T) = \sum\limits_{\rho_T\in {\text{Paths}}_{T,\pi}(s'_0)} {\mathbb{P}}_{ {\cal{M}}'_{\pi},s'_0}( {\text{Paths}}_{ {\cal{M}}'_\pi}^\omega(\rho_T)) $

(10)

$ = \sum\limits_{h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} {\mathbb{P}}_{ {\gamma}, {\cal{C}}}( {\cal{T}}_{ {\gamma}, {\cal{C}}}(h)) $

(11)

$ = {\mathbb{P}}_{ {\gamma}, {\cal{C}}}\bigg(\bigcup_{h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) \bigg) $

(12)

$ = {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}) $

(13)

where step 10 is by definition of probabilistic reachability, step 11 is by Lemma 4 and 6, step 12 is by disjointness of all $ {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) $ for $ h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}} $, and step 13 is by Lemma 1 and Eq. (2). From this, we obtain that $ \pi^* = \arg\max_{\pi'} {\mathbb{P}}_{ {\cal{M}}_{\pi'},s'_0}(\lozenge T) $ iff $ {\gamma}^* = \arg\max_{ {\gamma}'} {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{reach}}( {\gamma}') $.

It remains to prove that $ \pi $ is cost-optimal iff $ {\gamma} $ is cost-optimal. We have:

$ \begin{array}{l} {\mathbb{E}}_{\rho\sim {\cal{M}}'_\pi,s'_0} [ {\text{cost}}_T(\rho) \mid \lozenge T] = \\ \;\;\;\;\;\;\;\;\;\; = \sum\limits_{\rho_T\in {\text{Paths}}_{T,\pi}(s'_0)} {\mathbb{P}}_{ {\cal{M}}'_{\pi'},s'_0}( {\text{Paths}}_{ {\cal{M}}'_\pi}^\omega(\rho_T)) \cdot \sum\limits_{k = 0}^{|\rho_T|} C'(s'_k,a'_{k+1}) \end{array} $

(14)

$ = \sum\limits_{h\in {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} {\mathbb{P}}_{ {\gamma}, {\cal{C}}}( {\cal{T}}_{ {\gamma}, {\cal{C}}}(h)) \cdot \sum\limits_{k = 0}^{|h|} C_{o_{k+1}}(\sigma_{o_k,k},a_{k+1}) $

(15)

$ = {\mathbb{E}}_{h\sim {\mathbb{P}}_{ {\gamma}, {\cal{C}}}}\Bigg[ \sum\limits_{k = 1}^{|h|} C_{o_k}(\sigma_{o_k,k-1},a_{k}) \bigg\vert\ {\text{successful}}(h) \Bigg] $

(16)

$ = {\cal{J}}^{ {{\text{cost}}}}_{ {\cal{C}},\varphi}( {\gamma}) $

(17)

where step 14 by definition of total expected cost conditioned on reaching of target states $ T $, step 15 by construction of $ {\cal{M}}' $ and by Lemma 4 and 6, step 16 by definition of total expected cost on successful executions of $ {\gamma} $, and step 17 by Eq. (3). Therefore, if $ \pi\in \arg\min_{\pi'} {\mathbb{E}}_{\rho\sim {\cal{M}}'_\pi,s'_0}[ {\text{cost}}_T(\rho) \mid \lozenge T] $ then $ {\gamma}\in \arg\min_{ {\gamma}'} {\cal{J}}^{ {{\text{cost}}}}_{ {\cal{C}},\varphi}( {\gamma}') $. Combining both results, we get the thesis. □

Computational cost. Theorem 3 guarantees that we can reduce Problem 1 to the problem of finding an optimal policy for the lexicographic bi-objective optimization problem (Eq. [6]) over a composition MDP $ {\cal{M}}' $. As explained above, the two-stage technique requires solving a planning problem over MDPs. Since it is known that both steps require polynomial time complexity in the number of states and actions of the MDP^[24] and that our composition MDP has a state space that is double-exponentially larger in the size of the goal specification, we get this result:

Theorem 4 Problem 1 can be solved in at most double-exponential time in the size of the formula, in at most exponential time in the number of services, and in polynomial time in the size of the services.

This is in line with the classical setting of ʟᴛʟ/ʟᴛʟ_f synthesis on probabilistic systems^[39,40], and analogous to our solution method for the non-stochastic case^[41].

5. Stochastic service composition for safety ʟᴛʟ_f tasks

In this section, we consider the composition problem where the first objective is to maximize the probability of not violating a safety ʟᴛʟ_f specification, and as second objective, we maximize the expected conditional cost. This means that the notion of successful execution must be revised in order to consider a safety goal rather than a reachability goal.

We say that some infinite execution $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}} $ is legal (for $ {\cal{C}} $ and $ \varphi $), denoted with $ {\text{legal}}_{ {\cal{C}},\varphi}(h) $, if for all finite prefix histories $ h\in {\text{prefixes}}(t) $ we have that (1) $ {\text{actions}}(h)\models\varphi $, i.e. the safety condition is not violated at any time on the trace, and (2) for all positions $ k $ and all services $ i $, it holds that $ \sigma_{i,k}\in F_i $, i.e. every service is in an accepting state. We use $ {\text{legal}}_{ {\cal{C}},\varphi}(h) $ also for finite executions. We denote with $ {{\cal{H}}}^{\varphi,{\text{safe}}}_{ {\gamma}, {\cal{C}}} $ the set of finite and legal execution of orchestrator $ {\gamma} $ on service community $ {\cal{C}} $. Finally, we say that an orchestrator $ {\gamma} $ realizes the ʟᴛʟ_f specification $ \varphi $ with $ {\cal{C}} $ if, for all traces $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}} $, $ t $ is legal. When $ {\cal{C}} $ and $ \varphi $ are clear from the context, we omit the suffix in $ {\text{legal}}_{ {\cal{C}},\varphi} $ and only write $ {\text{legal}} $.

Example 4 Continuing with example 1, we consider the following ${\rm{LTL}} _f $ safety task: $ \Box\lnot{pluck} $, i.e. never $ {pluck} $ the ripe fruits and flowers. For this safety task, differently from Fig. 1, we consider states $ a_1 $, $ b_1 $, and $ c_2 $ to be 'accepting', meaning that they are states of their respective service in which the execution of the composition safety task can enter into. An orchestrator that maximizes the probability of satisfying this goal will only use $ {\cal{B}}_1 $ and $ {\cal{B}}_2 $, since $ {\cal{B}}_3 $ can only perform $ {pluck} $ from state $ c_0 $, whose execution would make the safety task to fail.

The satisfaction probability of the safety task $ \varphi $ under orchestrator $ {\gamma} $ and community $ {\cal{C}} $ is given by:

$ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}) = {\mathbb{P}}_{ {\gamma}, {\cal{C}}}\bigg(\big\{t \in {\cal{T}}_{ {\gamma}, {\cal{C}}} \,\big\vert\, {\text{legal}}(t)\big\}\bigg) $

(18)

Intuitively, $ {\cal{P}}_{ {\cal{C}},\varphi}^ {\text{safe}}( {\gamma}) $ is the probability measure (under $ {\mathbb{P}}_{ {\gamma}, {\cal{C}}} $) of the set of legal executions generated by executing the orchestrator $ {\gamma} $ over the service community $ {\cal{C}} $.

To formally define the second objective, we first define the total cost of horizon $ m $ for an execution $ t = (\sigma_{10}\dots\sigma_{n0}),(a_1,o_1),\dots $ as $ {{\text{Cost}}^{{\rm{C}}}_m}(t) = \sum^{m-1}_{i = 0} C_{o_i}(\sigma_{o_i}, a_{i+1}) $. Then, for an orchestrator $ {\gamma} $ and a state $ \sigma = (\sigma_1,\dots,\sigma_n) $, the expected mean-cost is defined as:

$ {\mathbb{E}}_{ {\gamma},\sigma}[MC] = \limsup\limits_{m\to\infty} \frac{1}{m} {\mathbb{E}}_{h\sim {\gamma},\sigma}[ {{\text{Cost}}^{{\rm{C}}}_m}] $

(19)

The optimal expected average cost starting from a state $ \sigma $ is defined over all orchestrators $ {\gamma} $ for $ {\cal{C}} $ as $ \inf_ {\gamma}{\mathbb{E}}_{ {\gamma}, \sigma}[MC] $. Moreover, we use $ {\mathbb{E}}_{h\sim\mathbb{P}_{ {\gamma}, {\cal{C}}}}[ {{\text{Cost}}^{{\rm{C}}}_m} \mid {\text{legal}}(h)] $ to denote the expected conditional finite horizon cost, where the condition is on whether $ h $ is legal. Finally, we define our second objective, the expected conditional mean-cost, as:

$ {\cal{J}}^{ {{\text{avg}\text-\text{cost}}}}_{ {\cal{C}},\varphi}( {\gamma}) = \limsup\limits_{m\to\infty} \frac{1}{m}{\mathbb{E}}_{h\sim\mathbb{P}_{ {\gamma}, {\cal{C}}}}\Bigg[ {{\text{Cost}}^{{\rm{C}}}_m} \mid {\text{legal}}(h)\Bigg] $

(20)

Intuitively, $ {\cal{J}}^{ {{\text{avg}\text{-}\text{cost}}}}_{ {\cal{C}},\varphi}( {\gamma}) $ takes into account the costs that the orchestrator incurs in executing its policy by picking the chosen services, but averaged across the entire length of the execution.

Finally, we define our optimization problem. We want to compute an orchestrator $ {\gamma} $ such that the following holds:

$ {\gamma}\in\Gamma_{ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}} \text{ and } {\cal{J}}^{ {{\text{avg}\text-\text{cost}}}}_{ {\cal{C}},\varphi}( {\gamma}) = \inf\limits_{ {\gamma}'\in\Gamma_{ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}}} {\cal{J}}^{ {{\text{avg}\text-\text{cost}}}}_{ {\cal{C}},\varphi}( {\gamma}') $

(21)

Intuitively, we fix a lexicographic order on the objective functions $ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi} $ and $ {\cal{J}}^{ {{\text{avg}\text{-}\text{cost}}}}_{ {\cal{C}},\varphi} $, meaning that we aim to minimize the expected utilization mean-cost to satisfy the specification, conditioned to the satisfaction of the specification, while guaranteeing the optimal probability of satisfying it.

Now we aim to find a connection between the satisfaction probability and the notion of realizability.

Let us consider the set of finite executions of $ {\gamma} $ with the community $ {\cal{C}} $ that becomes illegal for the first time, denote with $ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $. Formally:

$ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} = \{ h\in {\cal{H}}_{ {\gamma}, {\cal{C}}} \mid \lnot {\text{legal}}(h) \wedge \forall h'\in( {\text{prefixes}}(h)\setminus\{h\}): {\text{legal}}(h') \} $

(22)

Lemma 7 Let $ t $ be an infinite execution of $ {\gamma} $ over $ {\cal{C}} $. $ t $ is legal iff$ t\not\in \left(\bigcup_{h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) \right) $

Proof $ (\Rightarrow) $ by contrapositive, assume $ t\in (\bigcup_{h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h)) $. Then, there exist some $ h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $ such that $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) $. By definition of $ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $, we have that $ \lnot {\text{legal}}(h) $. Since $ h\in {\text{prefixes}}(t) $ by definition of $ {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) $, we have that, by definition of $ {\text{legal}} $, and from the above arguments, it follows that $ \lnot {\text{legal}}(t) $, i.e., there exist a finite prefix of $ t $ that is not legal, and therefore its infinite extension $ t $ is also not legal.

$ (\Leftarrow) $ by contrapositive, assume $ t $ is not legal. This means that there exists a prefix $ h\in {\text{prefixes}}(t) $ such that $ \lnot {\text{legal}}(h) $. Let $ h' $ be the shortest of such prefixes, i.e., the finite execution that violates the safety conditions for the first time. By definition of $ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $, it holds that $ h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $. Finally, since $ h'\in {\text{prefixes}}(t) $, we also have $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}}(h') $. Therefore, the claim holds. □

Observe that, from Lemma 7, the following equality holds:

$ \big\{t \in {\cal{T}}_{ {\gamma}, {\cal{C}}} \,\big\vert\, {\text{legal}}(t)\big\} = {\cal{T}}_{ {\gamma}, {\cal{C}}} \setminus \bigg(\bigcup\limits_{h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) \bigg) $

(23)

And in turn, from Eq. (23), we have:

$ \begin{split} {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}) =\;&{\mathbb{P}}_{ {\gamma}, {\cal{C}}}\bigg(\big\{t \in {\cal{T}}_{ {\gamma}, {\cal{C}}}) \,\big\vert\, {\text{legal}}(t)\big\}\bigg)\\ =\;& {\mathbb{P}}_{ {\gamma}, {\cal{C}}}\Bigg( {\cal{T}}_{ {\gamma}, {\cal{C}}} \setminus \bigg(\bigcup_{h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) \bigg)\Bigg)\\ =\;& {\mathbb{P}}_{ {\gamma}, {\cal{C}}}( {\cal{T}}_{ {\gamma}, {\cal{C}}}) - {\mathbb{P}}_{ {\gamma}, {\cal{C}}}\Bigg( {\cal{T}}_{ {\gamma}, {\cal{C}}} \cap \bigg(\bigcup_{h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) \bigg)\Bigg)\\ =\;& 1 - {\mathbb{P}}_{ {\gamma}, {\cal{C}}}\bigg(\bigcup_{h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) \bigg) \end{split} $

(24)

The intuitive meaning of the expression in Eq. (24) for $ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi} $ is that we can define the probability of satisfaction of the safety specification as the probability of avoiding finite executions $ h $ such that $ \lnot {\text{legal}}(h) $.

In case the specification is exactly realizable, the notion of optimal orchestrator according to Eq. (21) coincides with the notion of 'safe realizability', as shown in the following results.

Lemma 8 Orchestrator $ {\gamma} $ realizes the safety specification $ \varphi $ over $ {\cal{C}} $ iff $ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} = \emptyset $.

Proof ($ \Rightarrow $) Assume $ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} \neq \emptyset $. By assumption, there exist one finite execution $ h $ in the set $ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $. By Lemma 8, this implies that all infinite executions $ t $ that are extensions of $ h $, i.e., $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) $, are not legal. But this means that there exists at least one execution of $ {\gamma} $ that does satisfy the safety specification $ \varphi $; hence, $ {\gamma} $ does not realize the safety specification $ \varphi $.

($ \Leftarrow $) Assume that $ {\gamma} $ does not realize the safety specification $ \varphi $. This means that there exist a trace $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}} $ such that $ \lnot {\text{legal}}(t) $. By Lemma 7, this implies that $ t\in {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) $, for some $ h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $. Therefore, $ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}\neq\emptyset $. □

Theorem 5 Let $ {\cal{C}} $ be a community of stochastic services, and $ \varphi $ be a ${\rm{LTL}} _f $ safety specification. The orchestrator $ {\gamma} $ realizes $ \varphi $ with community $ {\cal{C}} $ iff $ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}) = 1 $.

Proof ($ \Rightarrow $) if $ {\gamma} $ realizes $ \varphi $, then by Lemma 8, $ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} = \emptyset $, and therefore $ {\mathbb{P}}_{ {\gamma}, {\cal{C}}}(\bigcup_{h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h)) = 0 $. Finally, by the definition of $ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi} $ (Eq. [18]) and by Eq. (24), we have that $ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}) = 1 - {\mathbb{P}}_{ {\gamma}, {\cal{C}}}(\bigcup_{h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h)) = 1 $

($ \Leftarrow $) If $ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}) = 1 $, then by definition of $ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi} $ (Eq. [18]) and by Eq. (24) we have that $ {\mathbb{P}}_{ {\gamma}, {\cal{C}}}(\bigcup_{h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h)) = 0 $. Since the left-hand side is a probability, the equation holds only if the argument of $ {\mathbb{P}}_{ {\gamma}, {\cal{C}}} $ is empty, which in turn implies that $ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} = \emptyset $. Finally, the claim follows by applying Lemma 8.□

Theorem 6 Assume ʟᴛʟ_f safety specification $ \varphi $ is realizable. If an orchestrator $ {\gamma} $ satisfies Eq. (21), then it realizes the specification $ \varphi $.

Proof Since by assumption $ \varphi $ is realizable, then there exists an orchestrator $ {\gamma}' $ that realizes it. By Theorem 5, we can deduce that the optimal value of $ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}') $ is $ 1 $. Moreover, by assumption and by Eq. (21), it follows that $ {\gamma} \in \Gamma_{ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}} $, i.e. $ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}) = 1 $, by the arguments above. Finally, again by Theorem 5, we get that $ {\gamma} $ realizes $ \varphi $.

Finally, we formally state the stochastic version of our problem:

Problem 2 (Stochastic Composition for ltl_f Safety Tasks). Given the pair $ ( {\cal{C}}, \varphi) $, where $ \varphi $ is an ʟᴛʟ_f safety specification over the set of actions $ A $, and $ {\cal{C}} $ is a community of $ n $ stochastic services $ {\cal{C}} = \{ {\cal{S}}_1,\dots, {\cal{S}}_n\} $, compute an orchestrator that is optimal according to Eq. (21).

As in the reachability case of Section 4, Theorems 5 and 6 show that one can find an orchestrator even in a non-stochastic setting by considering arbitrary services' probability distributions for $ P_i(\sigma_i,a) $, for any pair $ \sigma_i $ and $ a $, whose support is compatible with the service's transition function (even non-deterministic, as in study by De Giacomo et al.^[41]), and then check whether $ \max_ {\gamma} {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}) = 1 $.

5.1. Solution technique

Similarly to the previous case, the solution technique for the safety variant of the composition problem relies on finding an optimal policy for a bi-objective lexicographic optimization on a specifically built MDP. In particular, also for this problem, we consider a variant of the framework introduced in study by Busatto-Gaston et al.^[35]; while the second objective they considered was the maximization of the expected conditional mean payoff, here we consider the minimization of the expected conditional mean costs. In general, the technique has a significant overlap with the reachability variant. However, the main difference lies in how we build the automata-based construction: in the reachability case, we are interested in an orchestrator such that, for any (infinite) execution $ t $, a finite prefix of $ t $ satisfies the task $ \varphi $; in the safety case, we are interested in an orchestrator such that, for all prefixes, the safety task $ \varphi $ is satisfied. The definition of the composition MDP must take into account this different requirement.

Our technique breaks down into the following steps: (1) first, we translate the ${\rm{LTL}} _f $ formula into the equivalent $ {\rm{DFA}}$ $ {\cal{A}}_\varphi $; then (2) we combine $ {\cal{A}}_\varphi $ with the services, computing a product of $ {\cal{A}}_{\varphi} $ with the stochastic services in $ {\cal{C}} $, obtaining a new MDP, $ {\cal{M}}' $, that we call the 'composition MDP'; (3) we solve the resulting MDP, finding a policy $ \pi $ for $ {\cal{M}}' $ that is optimal w.r.t. the (safety) bi-objective lexicographic function, as in study by Busatto-Gaston et al.^[35], and then (4) we derive an orchestrator $ {\gamma} $ from $ \pi $ that is optimal w.r.t. Eq. (4).

Step 1. This step is the same as the previous case: starting from the ${\rm{LTL}} _f $ specification $ \varphi $, we compute the $ {\rm{DFA}}$ $ {\cal{A}}_\varphi = (A, Q, q_0, F, \delta) $ that is equivalent to the specification $ \varphi $^[11].

Step 2. Then, we define the composition MDP $ {\cal{M}}' = (S', A', P', s'_0) $ with the associated cost function $ C' $, as in step 2 for the reachability case. We are interested in computing optimal policies for $ {\cal{M}}' $, where the optimality is defined as follows. Consider the set of bad states:

$ {\text{Bad}} = \{(q,\sigma_1,\dots,\sigma_n) \mid (q,\sigma_1,\dots,\sigma_n)\in S' \wedge (q\not\in F \vee \exists i\in [1,n]\,: \sigma_i \not\in F_i)\} $

In other words, $ {\text{Bad}}$ is the set of states such that their $ Q $-component is not an accepting state or the current state of some service is not accepting. We consider the safety bi-objective lexicographic optimization over $ {\cal{M}}' $, similarly to what has been done in study by Busatto-Gaston et al.^[35]. In particular, we first consider the probability of avoiding a set of bad states $ {\text{Bad}} $ from $ s\in S' $, following a policy $ \pi $ over the MDP $ {\cal{M}}' $, denoted with $ {\mathbb{P}}_{ {\cal{M}}'_\pi,s}(\Box \lnot {\text{Bad}}) $; with $ \Pi_{ {\cal{M}}',s}(\Box \lnot {\text{Bad}}) $, we denote the set of policies with the maximum probability of avoiding states in $ {\text{Bad}} $ from state $ s $, i.e. $ \arg\max_\pi {\mathbb{P}}_{ {\cal{M}}'_\pi,s}(\Box\lnot {\text{Bad}}) $. Then, as the second objective, we consider the expected conditional mean-cost $ {\mathbb{E}}_{ {\cal{M}}'_\pi,s'_0}[MC \mid \Box\lnot {\text{Bad}}] $. An optimal solution for $ {\cal{M}}' $ is a policy $ \pi $ that minimizes the expected conditional mean-cost $ \limsup_{m\to\infty} \frac{1}{n} {\mathbb{E}}_{ {\cal{M}}'_\pi,s'_0}[ {{\text{Cost}}_m} \mid \Box\lnot {\text{Bad}}] $, while avoiding the set of bad states, but only among the policies in $ \Pi_{ {\cal{M}}',s'_0}(\Box\lnot {\text{Bad}}) $, i.e., the policies that maximize $ {\mathbb{P}}_{ {\cal{M}}'_\pi,s'_0}(\Box\lnot {\text{Bad}}) $:

$ \pi\in \Pi_{ {\cal{M}}',s'_0}(\Box\lnot {\text{Bad}}) \text{ and } \pi\in\arg\limits_{\pi'}\inf {\mathbb{E}}_{\rho\sim {\cal{M}}'_{\pi'},s_0}[MC \mid \Box\lnot {\text{Bad}}] $

(25)

Step 3. The solution technique we will use is, once again, based on the work^[35], where the authors propose a two-stage technique to find an optimal policy for a bi-objective lexicographic function in the form of Eq. (25). First, we compute the set of policies (in the form of a set of optimal actions for each state) that maximize the probability of staying far from the bad set of states $ {\text{Bad}} $; however, this set of policies also contains policies that do not take into account the utilization costs of the services. Then, we consider a 'pruned MDP' in which (i) only optimal actions can be taken, and (ii) only 'good' states are kept. The new MDP is used to find policies that minimize the expected conditional mean-cost while maximizing the probability of staying in good states. Crucially, observe that, by construction, avoiding states in Bad is equivalent to not violating the safety specification $ \varphi $, since bad states have their $ Q $-component to be states of $ {\cal{A}}_\varphi $. Moreover, by construction, the optimal policy of the pruned MDP guarantees that no better probability of staying safe can be achieved, since sub-optimal actions for the first objective are pruned. In this case, the difference between our scenario and Busatto-Gaston et al.^[35] is that they consider reward maximization, rather than cost minimization, as the second objective function. Nevertheless, it is easy to see that we can minimize costs by maximizing negative rewards. Note that the two subproblems can be solved efficiently using standard planning algorithms, e.g., value iteration or linear programming.

Step 4. Once an optimal policy is found, we can obtain its equivalent $ {\gamma} $ as in step 4 of the reachability case.

Now we are going to establish a relationship between optimal orchestrators according to Eq. (21), and optimal policies for $ {\cal{M}}' $ according to Eq. (25). Note that Lemma 3 and Lemma 4 also hold in this context since the composition MDP construction $ {\cal{M}}' $ is the same as for the reachability case.

Let $ {\text{Paths}}^\omega_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) $ be the set of infinite paths following $ \pi $ on $ {\cal{M}}' $ such that they start with $ s'_0 $ and never enter in bad state in $ {\text{Bad}}$, i.e. $ {\text{Paths}}^\omega_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) = ((S'\setminus {\text{Bad}}) \times A)^\omega\cap {\text{Paths}}^\omega_{ {\cal{M}}'_\pi}(s'_0) $. Let $ {\text{Paths}}_{ {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) $ be the 'dual' set, i.e. the set of finite paths following $ \pi $ on $ {\cal{M}}' $ such that they start with $ s'_0 $ and enter in bad state in $ {\text{Bad}}$ only at the end of the path and for the first time, i.e. $ {\text{Paths}}_{ {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) = ((S'\setminus {\text{Bad}}) \times A)^* {\text{Bad}}\cap {\text{Paths}}_{ {\cal{M}}'_\pi}(s'_0) $. Let also $ {\text{Paths}}^\omega_{ {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) = \bigcup_{\rho'\in {\text{Paths}}_{ {\text{Bad}}, {\cal{M}}'_\pi}(s'_0)} {\text{Paths}}^\omega_{ {\cal{M}}'_\pi}(\rho') $, i.e. the set of infinite paths such that at least once they visited a bad state, and $ {\text{Paths}}_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s_0') $ the set of finite paths that never visit a $ {\text{Bad}} $ state. It is easy to see that the two sets $ {\text{Paths}}^\omega_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) $ and $ {\text{Paths}}^\omega_{ {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) $ are disjoint and complete. Intuitively, regarding disjointness, a finite path $ \rho'\in {\text{Paths}}_{ {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) $ cannot be the prefix of any infinite path $ \rho\in {\text{Paths}}^\omega_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) $, since $ {\text{last}}(\rho')\in {\text{Bad}} $ and $ \rho $ does not visit any bad state. Regarding completeness, there are no other cases in between, i.e. a infinite path either belongs to the first set or to the second set:

Lemma 9 The following two propositions hold:

1. $ {\text{Paths}}^\omega_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) \cap {\text{Paths}}^\omega_{ {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) = \emptyset $;

2. $ {\text{Paths}}^\omega_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) \cup {\text{Paths}}^\omega_{ {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) = {\text{Paths}}^\omega_{ {\cal{M}}'_\pi}(s'_0) $.

Another crucial observation is that $ {\text{Paths}}_{ {\text{Bad}}, {\cal{M}}'_\pi}(s'_0) $ is a set of finite paths forming pairwise disjoint cylinder sets. To see this, it is enough to observe that each of these paths has only the last state in Bad, while the previous states are all $ \lnot {\text{Bad}} $; this means that they cannot share a common prefix.

Lemma 10 Let $ \rho = s_0a_1\dots s_m\in {\text{Paths}}_{ {\cal{M}}'_\pi} $ be a finite path on $ {\cal{M}}'_\pi $ for some policy $ \pi $, and let $ h = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho) $ be its associated history. Then, $ {\text{legal}}(h) $ iff for all $ k\in [0,m] $ we have $ s_k\not\in {\text{Bad}} $.

Proof We prove that $ \lnot {\text{legal}}(h) $ iff $ \exists k. s_k\in {\text{Bad}} $. By definition of $ {\text{Bad}} $, if a state $ s = (q,\sigma_1,\dots,\sigma_n) $ is in Bad, it means that (a) $ q\not\in F $ or (b) there exists $ i $ such that $ \sigma_i\not\in F_i $. Let us consider a generic time step $ k $ and the associated state $ s_k $ in $ \rho $. For $ s_k $, Proposition (a) holds iff the sequence of $ Q $-components of the states occurring in $ \rho[0:k] $, i.e. $ r = q_0,\dots,q_k $, is a non-accepting run over $ {\cal{A}}_\varphi $ for trace $ a_1\dots a_{k} $. By definition of acceptance of a trace, the above holds iff $ a_1\dots a_{k}\in {\cal{L}}( {\cal{A}}_\varphi) $. By the correctness of the construction of $ {\cal{A}}_\varphi $, this is true iff $ a_1\dots a_{k}\not\models \varphi $. By definition of $ {\tilde \tau} $, we can rewrite the above expression as (c) $ {\text{actions}}(h')\not\models\varphi $, where $ h' = h[0:k] $. Finally, by definition of $ {\text{legal}} $, we have that for some $ k $ (c) or (b) holds iff $ \lnot {\text{legal}}(h) $□

Lemma 11 Let $ \pi $ be a policy for $ {\cal{M}}' $ and let $ {\gamma} $ be its equivalent orchestrator. Moreover, let $ \rho = s_0a_1s_1\cdots s_m\in {\text{Paths}}_{ {\cal{M}}'_\pi} $ be a finite path on $ {\cal{M}}'_\pi $ and $ h = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho) $. Then, $ \rho\in {\text{Paths}}_{ {\text{Bad}}, {\cal{M}}'_\pi} $ iff $ h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $.

Proof First, note that by Lemma 3, $ \rho\in {\text{Paths}}_{ {\cal{M}}'_\pi} $ iff $ h = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho) $ is an execution of $ {\gamma} $.

$ (\Rightarrow) $ By Lemma 10, since $ \rho[m]\in {\text{Bad}} $, we have $ \lnot {\text{legal}}(h) $. Moreover, for all $ j = 0,\dots,m-1 $, all states in $ \rho[0:j] $ are not in $ {\text{Bad}} $, and therefore again by Lemma 10, for each prefix $ h[0:j]\in {\text{prefixes}}(h) $, $ {\text{legal}}(h[0:j]) $ holds. By definition of $ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $, this implies that $ h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $.

$ (\Leftarrow) $ If $ h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $, then by definition of $ \overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $ we have that (1) $ \lnot {\text{legal}}(h) $ and that (2) for all $ h'\in {\text{prefixes}}(h) $ such that $ h'\neq h $, we have $ {\text{legal}}(h') $. By Lemma 10, proposition (2) implies that for all $ j = 0,\dots,m-1 $, all prefixes $ \rho[0:j] $ are such that their states are not in $ {\text{Bad}} $. Moreover, from (1), and again by Lemma 10, it follows that there exists $ k\in[0,m] $ such that $ \rho[k]\in {\text{Bad}} $, but from the previous argument, $ k $ must be equal to $ m $ since states at positions smaller than $ m $ in $ \rho $ are not in $ {\text{Bad}} $. In other words, the last state in $ \rho $ is in $ {\text{Bad}} $, while all the other states along $ \rho $ are not. Therefore, by definition of $ {\text{Paths}}_{ {\text{Bad}}, {\cal{M}}'_\pi} $, we have $ \rho\in {\text{Paths}}_{ {\text{Bad}}, {\cal{M}}'_\pi} $.□

Lemma 12 Let $ \pi $ be a policy for $ {\cal{M}}' $ and let $ {\gamma} $ be its equivalent orchestrator. Moreover, let $ \rho = s_0a_1s_1\cdots s_m\in {\text{Paths}}_{ {\cal{M}}'_\pi} $ be a finite path on $ {\cal{M}}'_\pi $ and $ h = {\tilde \tau}_{\varphi, {\cal{C}}}(\rho) $. Then, $ \rho\in {\text{Paths}}_{\lnot {\text{Bad}}, {\cal{M}}'_\pi} $ if $ h\in {{\cal{H}}^{\varphi,{\text{safe}}}}_{ {\gamma}, {\cal{C}}} $.

Proof By definition of $ {{\cal{H}}^{\varphi,{\text{safe}}}}_{ {\gamma}, {\cal{C}}} $, $ h\in {{\cal{H}}^{\varphi,{\text{safe}}}}_{ {\gamma}, {\cal{C}}} $ iff $ {\text{legal}}(h) $. By Lemma 10, $ {\text{legal}}(h) $ iff for all $ k = 0\dots m $, $ \rho[k]\not\in {\text{Bad}} $. By definition of $ {\text{Paths}}_{\lnot {\text{Bad}}, {\cal{M}}'_\pi} $, this holds iff $ \rho\in {\text{Paths}}_{\lnot {\text{Bad}}, {\cal{M}}'_\pi} $. □

Theorem 7 Let $ ( {\cal{C}}, \varphi) $ be an instance of Problem 2, and let $ {\cal{M}}' $ be the composition MDP for $ {\cal{C}} $ and $ \varphi $. We have that $ \pi $ is optimal (w.r.t. Eq. [25]) iff its equivalent orchestrator $ {\gamma} $ is optimal (w.r.t. Eq. [21]).

Proof First, we show that $ \pi = \arg\max_{\pi'} {\mathbb{P}}_{ {\cal{M}}'_\pi,s'_0}(\Box \lnot {\text{Bad}}) $ iff $ {\gamma} = \arg\max_{ {\gamma}'} {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}') $. For any pair $ \pi $ and its equivalent $ {\gamma} $, we have:

$ {\mathbb{P}}_{ {\cal{M}}'_\pi,s'_0}(\Box \lnot {\text{Bad}}) = {\mathbb{P}}_{ {\cal{M}}'_{\pi},s'_0}( {\text{Paths}}^\omega_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s'_0)) $

(26)

$ = 1 - {\mathbb{P}}_{ {\cal{M}}'_{\pi},s'_0}( {\text{Paths}}^\omega_{ {\text{Bad}}, {\cal{M}}'_\pi}(s'_0)) $

(27)

$ = 1 - \sum\limits_{\rho'\in {\text{Paths}}_{ {\text{Bad}}, {\cal{M}}'_\pi}(s'_0)} {\mathbb{P}}_{ {\cal{M}}'_{\pi},s'_0}( {\text{Paths}}^\omega_{ {\cal{M}}'_\pi}(\rho')) $

(28)

$ = 1 - \sum\limits_{h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}} {\mathbb{P}}_{ {\gamma}, {\cal{C}}}( {\cal{T}}_{ {\gamma}, {\cal{C}}}(h)) $

(29)

$ = 1 - {\mathbb{P}}_{ {\gamma}, {\cal{C}}}\bigg(\bigcup\limits_{h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}}} {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) \bigg) $

(30)

$ = {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}) $

(31)

where step 26 is by definition of probabilistic safety, step 27 is by Lemma 9, step 28 is by disjointness of all finite paths $ \rho'\in {\text{Paths}}_{ {\text{Bad}}, {\cal{M}}'_\pi} $ that end in a $ {\text{Bad}}$ state, step 29 is by Lemma 4 and 11, step 30 is by disjointness of all $ {\cal{T}}_{ {\gamma}, {\cal{C}}}(h) $ for $ h\in\overline{ {\cal{H}}^{\varphi}_{ {\gamma}, {\cal{C}}}} $, and step 31 is by definition of $ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi} $ (Eq. [18]) and by Eq. (24). From this, we obtain that $ \pi^* = \arg\max_{\pi'} {\mathbb{P}}_{ {\cal{M}}'_\pi,s'_0}(\Box \lnot {\text{Bad}}) $ iff $ {\gamma}^* = \arg\max_{ {\gamma}'} {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}') $.

It remains to prove that $ \pi $ is cost-optimal iff $ {\gamma} $ is cost-optimal. In the following, we use $ {\text{Paths}}^{(m)} $ to denote paths over $ {\cal{M}}' $ of length $ m $. By definition of the expected conditional finite horizon cost $ {\mathbb{E}}_{ {\cal{M}}'_\pi,s'_0}[ {{\text{Cost}}_m} \mid \Box\lnot {\text{Bad}}] $, using the law of total expectation, we get:

$\begin{split} {\mathbb{E}}_{ {\cal{M}}'_\pi,s'_0}( {{\text{Cost}}_m} \mid \Box\lnot {\text{Bad}}) =\;& \sum\limits_{\rho\in {\text{Paths}}^{(m)}_{ {\cal{M}}'_\pi}} {\mathbb{E}}\left[ {{\text{Cost}}_m} \mid {\text{Paths}}^\omega_{ {\cal{M}}'_\pi}(\rho), \Box\lnot {\text{Bad}}\right]\\& {\mathbb{P}}_{ {\cal{M}}'_\pi}\left( {\text{Paths}}^\omega_{ {\cal{M}}'_\pi}(\rho) \mid \Box\lnot {\text{Bad}}\right)\end{split} $

Intuitively, we are averaging the total costs of all paths of length $ m $ that never visit $ {\text{Bad}} $ by the probability of those paths being visited under policy $ \pi $ in $ {\cal{M}} $. Since $ {{\text{Cost}}_m} $ depends only on the first $ m $ steps, it is constant on each cylinder; hence, $ {\mathbb{E}}[ {{\text{Cost}}_m}\mid {\text{Paths}}^\omega_{ {\cal{M}}'_\pi}(\rho), \Box\lnot {\text{Bad}}] = {{\text{Cost}}_m} = \sum_{k = 0}^{m-1} C'(s_k',a_{k+1}) $. Moreover, by applying the definition of conditional probability ($ P(A\mid B) = \frac{P(A\cap B)}{P(B)} $), and by observing that paths that do not start with $ s_0' $ have probability $ 0 $, we get:

$ \sum\limits_{\rho\in {\text{Paths}}^{(m)}_{ {\cal{M}}'_\pi}(s_0')} \Bigg(\sum\limits_{k = 0}^{m-1} C'(s_k',a_{k+1})\Bigg) \frac{ {\mathbb{P}}_{ {\cal{M}}'_\pi}\left( {\text{Paths}}^\omega_{ {\cal{M}}'_\pi}(\rho) \cap \Box\lnot {\text{Bad}}\right)} { {\mathbb{P}}_{ {\cal{M}}'_\pi}\left(\Box\lnot {\text{Bad}}\right)} $

Note that the denominator $ {\mathbb{P}}_{ {\cal{M}}'_\pi,s_0'}(\Box\lnot {\text{Bad}} ) $ is a normalization factor due to the condition $ \Box\lnot {\text{Bad}} $ in the conditional probability. Moreover, note that $ {\mathbb{P}}_{M'_\pi,s_0'}(\Box\lnot {\text{Bad}}) = {\mathbb{P}}( {\text{Paths}}^\omega_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s_0')) $; hence, the above expression can be rewritten as:

$ \sum\limits_{\rho\in {\text{Paths}}^{(m)}_{ {\cal{M}}'_\pi}(s_0')} \Bigg(\sum\limits_{k = 0}^{m-1} C'(s_k',a_{k+1})\Bigg) \frac{ {\mathbb{P}}_{ {\cal{M}}'_\pi,s_0'}\left( {\text{Paths}}^\omega_{ {\cal{M}}'_\pi}(\rho) \cap {\text{Paths}}^\omega_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s_0')\right)}{ {\mathbb{P}}_{ {\cal{M}}'_\pi,s_0'}( {\text{Paths}}^\omega_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s_0'))} $

Note that whenever $ \rho\in( {\text{Paths}}^{(m)}_{ {\cal{M}}'_\pi}(s_0')\setminus {\text{Paths}}_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}) $, the corresponding term in the sum is zero. Therefore, we can rewrite the summation condition as:

$ \sum\limits_{\rho\in {\text{Paths}}^{(m)}_{\lnot {\text{Bad}}, {\cal{M}}'_\pi(s_0')}} \Bigg(\sum\limits_{k = 0}^{m-1} C'(s_k',a_{k+1})\Bigg) \frac{ {\mathbb{P}}_{ {\cal{M}}'_\pi,s_0'}( {\text{Paths}}^\omega_{ {\cal{M}}'_\pi}(\rho))}{ {\mathbb{P}}_{ {\cal{M}}'_\pi,s_0'}( {\text{Paths}}^\omega_{\lnot {\text{Bad}}, {\cal{M}}'_\pi}(s_0'))} $

By Lemma 12, there is a bijection between paths in $ \rho\in {\text{Paths}}^{(m)}_{ {\cal{M}}'_\pi}(s_0') $ and histories in $ h\in {{\cal{H}}^{\varphi,{\text{safe}}}}_{ {\gamma}, {\cal{C}}} $ with $ |h| = m $. By Lemma 4, the probability of the cylinder sets of $ \rho $ and $ h $ is the same. Moreover, by construction of $ {\cal{M}}' $, for all $ k = 0,\dots,m-1 $, the terms of the form $ C'(s_k',a_{k+1}) $ in the sum of the costs can be rewritten as $ C_{o_{k+1}}(\sigma_{o_{k+1},k}, a_{k+1}) $, i.e., the cost of executing action $ a_{k+1} $ with the $ o_{k+1} $-th service in its current state. Finally, by the first part of the proof (steps 26–31), we have that $ {\mathbb{P}}_{ {\cal{M}}'_\pi,s_0'}(\Box\lnot {\text{Bad}}) = {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma}) $ and that the probability of a path with no bad states in $ {\cal{M}} $ is the same as the probability of the corresponding trace $ t $ being legal in $ {\cal{C}} $. Therefore, we can rewrite the expression as follows:

$ \begin{split}& \sum\limits_{h\in {{\cal{H}}^{\varphi,{\text{safe}}}}_{ {\gamma}, {\cal{C}}}: |h| = m} \Bigg(\sum\limits_{k = 0}^{m-1} C_{o_{k+1}}(\sigma_{o_{k+1},k},a_{k+1})\Bigg) \frac{ {\mathbb{P}}_{ {\cal{M}}'_\pi,s_0'}( {\cal{T}}_{ {\gamma}, {\cal{C}}}(h))}{ {\cal{P}}^{ {\text{safe}}}_{ {\cal{C}},\varphi}( {\gamma})}\\ =\;& {\mathbb{E}}_{h\sim {\mathbb{P}}_{ {\gamma}, {\cal{C}}}}\Bigg[ {{\text{Cost}}^{{\rm{C}}}_m} \vert\ {\text{legal}}(t) \Bigg] \end{split} $

From the above, it follows that

$\begin{split} {\cal{J}}^{ {{\text{avg}\text-\text{cost}}}}_{ {\cal{C}},\varphi}( {\gamma}) =\;& \limsup\limits_{m\to\infty} \frac{1}{m}{\mathbb{E}}_{h\sim\mathbb{P}_{ {\gamma}, {\cal{C}}}}\Bigg[ {{\text{Cost}}^{{\rm{C}}}_m} \bigg\vert {\text{legal}}(h)\Bigg] \\=\;& \limsup\limits_{m\to\infty} \frac{1}{m} {\mathbb{E}}_{\rho\sim {\cal{M}}'_{\pi},s'_0}(MC \mid \Box\lnot {\text{Bad}}), \end{split}$

since we are taking the limit of equal terms. Therefore, $ \pi\in\arg\inf_{\pi'} \limsup\limits_{m\to\infty} \frac{1}{m} {\mathbb{E}}_{\rho\sim {\cal{M}}'_{\pi'},s'_0}(MC \mid \Box\lnot {\text{Bad}}) $ if $ {\gamma}\in \arg\inf_{ {\gamma}'} {\cal{J}}^{ {{\text{avg}\text{-}\text{cost}}}}_{ {\cal{C}},\varphi}( {\gamma}') $. Combining both results, we get the thesis.□

Computational cost. Theorem 7 guarantees that we can reduce Problem 2 to the problem of finding an optimal policy for the lexicographic bi-objective optimization problem (Eq. [6]) over the composition MDP $ {\cal{M}}' $, where this time the first objective is to maximize the probability of avoiding a set of states in an MDP, and as a second objective, we minimize the expected conditional mean-cost.

As explained above, the two-stage technique requires solving a planning problem over MDPs. Since it is known that both steps require polynomial time complexity in the number of states and actions of the MDP^[24] and that our Composition MDP has a state space that is a single-exponential in the size of the goal specification, we get this result:

Theorem 8: Problem 2 can be solved in at most double-exponential time in the size of the formula, in at most exponential time in the number of services, and in polynomial time in the size of the services.

[1]	Papazoglou MP, Traverso P, Dustdar S, Leymann F. 2007. Service-oriented computing: state of the art and research challenges. Computer 40(11):38−45 doi: 10.1109/MC.2007.400 CrossRef Google Scholar
[2]	Berardi D, Calvanese D, De Giacomo G, Lenzerini M, Mecella M. 2003. Automatic composition of E-services that export their behavior. In Service-Oriented Computing - ICSOC 2003. Lecture Notes in Computer Science. Vol. 2910. Berlin, Heidelberg: Springer. pp. 43–58 doi: 10.1007/978-3-540-24593-3_4
[3]	Berardi D, Calvanese D, De Giacomo G, Mecella M. 2005. Composition of services with nondeterministic observable behavior. In Service-Oriented Computing ICSOC2005. Lecture Notes in Computer Science. Vol. 3826. Berlin, Heidelberg: Springer. pp. 520–526 doi: 10.1007/11596141_43
[4]	De Giacomo G, Mecella M, Patrizi F. 2013. Automated service composition based on behaviors: the Roman model. In Web Services Foundations. New York, NY: Springer. pp. 189–214 doi: 10.1007/978-1-4614-7518-7_8
[5]	Monti F, Silo L, Leotta F, Mecella M. 2023. On the suitability of AI for service-based adaptive supply chains in smart manufacturing. 2023 IEEE International Conference on Web Services (ICWS). July 2–8, 2023. Chicago, IL, USA. USA: IEEE. pp. 704–706 doi: 10.1109/icws60048.2023.00091
[6]	Monti F, Silo L, Leotta F, Mecella M. 2023. Services in smart manufacturing: comparing automated reasoning techniques for composition and orchestration. Service-Oriented Computing. SummerSOC 2023. Communications in Computer and Information Science. vol 1847. Cham: Springer. pp. 69–83 doi: 0.1007/978-3-031-45728-9_5
[7]	De Giacomo G, Favorito M, Leotta F, Mecella M, Silo L. 2023. Digital twin composition in smart manufacturing via Markov decision processes. Computers in Industry 149:103916 doi: 10.1016/j.compind.2023.103916 CrossRef Google Scholar
[8]	Yadav N, Sardina S. 2011. Decision theoretic behavior composition. AAMAS '11: The 10^th International Conference on Autonomous Agents and Multiagent Systems, 2−6 May, 2011, Taipei, Taiwan, China. USA: International Foundation for Autonomous Agents and Multiagent Systems. pp. 575−582 doi: 10.65109/zgyz4787
[9]	Brafman RI, De Giacomo G, Mecella M, Sardina S. 2017. Service composition in stochastic settings. AIIA 2017 Advances in Artificial Intelligence. Lecture Notes in Computer Science*. Cham: Springer. pp. 159–171 doi: 10.1007/978-3-319-70169-1_12
[10]	Marrella A, Mecella M, Sardiña S. 2018. Supporting adaptiveness of cyber-physical processes through action-based formalisms. AI Communications 31(1):47−74 doi: 10.3233/aic-170748 CrossRef Google Scholar
[11]	De Giacomo G, Vardi MY. 2013. Linear temporal logic and linear dynamic logic on finite traces. IJCAI '13: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, 3–9 August, 2013, Beijing, China. USA: AAAI Press. pp. 854–860 https://dl.acm.org/doi/10.5555/2540128.2540252
[12]	Debois S, Hildebrandt TT, Laursen PH, Ulrik KR. 2007. Declarative process mining for DCR graphs. SAC '17: Proceedings of the Symposium on Applied Computing, 3−7 April, 2017, Marrakech Morocco. USA: Association for Computing Machinery. pp. 759–764 doi: 10.1145/3019612.3019622
[13]	Di Ciccio C, Montali M. 2022. Declarative process specifications: reasoning, discovery, monitoring. In Process Mining Handbook. Lecture Notes in Business Information Processing. Vol. 448. Cham: Springer. pp. 108–152 doi: 10.1007/978-3-031-08848-3_4
[14]	Dumas M, Fournier F, Limonad L, Marrella A, Montali M, et al. 2023. AI-augmented business process management systems: a research manifesto. ACM Transactions on Management Information Systems 14(1):1−19 doi: 10.1145/3576047 CrossRef Google Scholar
[15]	De Giacomo G, Favorito M, Leotta F, Mecella M, Silo L. 2021. Digital twins composition via Markov decision processes. Proceedings of the 1st Italian Forum on Business Process Management, co-located with the 19^th International Conference of Business Process Management (BPM 2021). Rome, Italy, 10^th September, 2021. Vol. 2952. CEUR Workshop Proceedings. pp. 44–49 https://ceur-ws.org/Vol-2952/paper_297a.pdf
[16]	De Giacomo G, Favorito M, Leotta F, Mecella M, Silo L. 2022. Modeling resilient cyberphysical processes and their composition from digital twins via Markov Decision Processes. In Proceedings of the Workshop on Process Management in the AI Era (PMAI 2022), co-located with 31st International Joint Conference on Artificial Intelligence and the 25^th European Conference on Artificial Intelligence (IJCAI-ECAI 2022). Wien, Austria, 23 July, 2022. Vol. 3310. CEUR Workshop Proceedings. pp. 101–104 https://ceur-ws.org/Vol-3310/paper7.pdf
[17]	De Giacomo G, Favorito M, Leotta F, Mecella M, Monti F, et al. 2023. AIDA: a tool for resiliency in smart manufacturing. Intelligent Information Systems. CAiSE 2023. Lecture Notes in Business Information Processing. Vol. 477. Cham: Springer. pp. 112–120 doi: 10.1007/978-3-031-34674-3_14
[18]	Chatterjee K, Majumdar R, Henzinger TA. 2006. Markov decision processes with multiple objectives. In: STACS Lecture Notes in Computer Science. Vol. 3884. Berlin, Heidelberg: Springer. pp. 325–336 doi: 10.1007/11672142_26
[19]	Chatterjee K, Katoen JP, Weininger M, Winkler T. 2020. Stochastic games with lexicographic reachability-safety objectives. In Computer Aided Verification. CAV 2020. Lecture Notes in Computer Science. Vol. 12225. Cham: Springer, 2020, pp. 398–420 doi: 10.1007/978-3-030-53291-8_21
[20]	Hahn EM, Perez M, Schewe S, Somenzi F, Trivedi A, et al. 2021. Model-free reinforcement learning for lexicographic omega-regular objectives. In Formal Methods. FM 2021. Lecture Notes in Computer Science. Vol. 13047. Lecture Notes in Computer Science. Cham: Springer. pp. 142–159 doi: 10.1007/978-3-030-90870-6_8
[21]	Skalse J, Hammond L, Griffin C, Abate A. 2022. Lexicographic multi-objective reinforcement learning. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2022). pp. 3430–3436 www.ijcai.org/proceedings/2022/0476.pdf
[22]	Wray KH, Zilberstein S, Mouaddib AI. 2015. Multi-objective MDPs with conditional lexicographic reward preferences. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 25−30 January, 2015, Austin, Texas. USA: AAAI Press. pp. 3418−3424 https://ojs.aaai.org/index.php/AAAI/article/view/9647
[23]	Skalse J, Abate A. 2023. On the limitations of Markovian rewards to express multiobjective, risk-sensitive, and modal tasks. In 39^th Conference on Uncertainty in Artificial Intelligence, 31^st July–4^th August, 2023, Pittsburgh, PA, USA. Vol. 216. pp. 1974–1984 https://dl.acm.org/doi/10.5555/3625834.3626019
[24]	Puterman ML. 1994. Markov decision processes: discrete stochastic dynamic programming. New York, USA: John Wiley & Sons. 672 pp. https://dl.acm.org/doi/10.5555/528623
[25]	Gao A, Yang D, Tang S, Zhang M. 2005. Web service composition using Markov Decision Processes. In WAIM. Vol. 3739. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2005, pp. 308–319 doi: 10.1007/11563952_28
[26]	Chen K, Xu J, Reiff-Marganiec S. 2009. Markov-HTN planning approach to enhance flexibility of automatic web service composition. 2009 IEEE International Conference on Web Services. 6–10 July, 2009, Los Angeles, CA, USA. USA: IEEE. pp. 9–16 doi: 10.1109/ICWS.2009.43
[27]	Moustafa A, Zhang M. 2013. Multi-objective service composition using reinforcement learning. In Service-Oriented Computing. ICSOC 2013. Lecture Notes in Computer Science. Vol. 8274. Berlin, Heidelberg: Springer. pp. 298–312 doi: 10.1007/978-3-642-45005-1_21
[28]	YChen Y, Huang J, Lin C, Shen X. 2019. Multi-objective service composition with QoS dependencies. IEEE Transactions on Cloud Computing 7(2):537−552 doi: 10.1109/tcc.2016.2607750 CrossRef Google Scholar
[29]	Sadeghiram S, Ma H, Chen G. 2020. A user-preference driven lexicographic approach for multi-objective distributed Web service composition. 2020 IEEE Symposium Series on Computational Intelligence (SSCI). 1−4 December, 2020, Canberra, ACT, Australia. USA: IEEE. pp. 791−797 doi: 10.1109/ssci47803.2020.9308222
[30]	De Giacomo G, Favorito M, Silo L. 2024. Composition of stochastic services for LTL_f goal specifications. In Foundations of Information and Knowledge Systems. Lecture Notes in Computer Science. Vol. 14589. Cham: Springer. pp. 298–316 doi: 10.1007/978-3-031-56940-1_17
[31]	De Giacomo G, Vardi MY. 2015. Synthesis for LTL and LDL on finite traces. IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence, July 25–31, 2015, Buenos Aires, Argentina. USA: AAAI Press. pp. 1558–1564 https://dl.acm.org/doi/10.5555/2832415.2832466
[32]	De Giacomo G, Di Stasio A, Tabajara LM, Vardi MY, Zhu S. 2022. Finite-trace and generalized-reactivity specifications in temporal synthesis. Formal Methods in System Design 61(2):139−163 doi: 10.1007/s10703-023-00413-2 CrossRef Google Scholar
[33]	Aminof B, De Giacomo G, Di Stasio A, Francon H, Rubin S, et al. 2025. LTL_f synthesis under environment specifications for reachability and safety properties. Information and Computation 303:105255 doi: 10.1016/j.ic.2024.105255 CrossRef Google Scholar
[34]	Aminof B, De Giacomo G, Rubin S, Vardi MY. 2025. LTLf+ and PPLTL+: extending LTLf and PPLTL to infinite traces. In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. 16–22 August 2025, Montreal, Canada, with satellite event in Guangzhou, China, 29–31 August 2025. ijcai.org, 2025, pp. 8447–8455 www.ijcai.org/proceedings/2025/0939.pdf
[35]	Busatto-Gaston D, Chakraborty D, Majumdar A, Mukherjee S, Pérez GA, et al. 2023. Bi-objective lexicographic optimization in markov decision processes with related objectives. In Automated Technology for Verification and Analysis. ATVA 2023. Lecture Notes in Computer Science. Vol. 14215. Cham: Springer. pp. 203–223 doi: 10.1007/978-3-031-45329-8_10
[36]	Fionda V, Greco G. 2018. LTL on finite and process traces: complexity results and a practical reasoner. Journal of Artificial Intelligence Research 63:557−623 doi: 10.1613/jair.1.11256 CrossRef Google Scholar
[37]	Brafman RI, De Giacomo G, Patrizi F. 2018. LTLf/LDLf non-markovian rewards. Artificial Intelligence 331:1771−1778 doi: 10.1016/j.artint.2024.104113 CrossRef Google Scholar
[38]	De Alfaro L. 1997. Formal verification of probabilistic systems. PhD thesis. Stanford University, USA. https://searchworks.stanford.edu/view/3910936.
[39]	Courcoubetis C, Yannakakis M. 1995. The complexity of probabilistic verification. Journal of the ACM 42(4):857−907 doi: 10.1145/210332.210339 CrossRef Google Scholar
[40]	Wells AM, Lahijanian M, Kavraki LE, Vardi MY. 2020. LTLf synthesis on probabilistic systems. Electronic Proceedings in Theoretical Computer Science 326:166−181 doi: 10.4204/eptcs.326.11 CrossRef Google Scholar
[41]	De Giacomo G, Favorito M, Silo L. 2025. Service composition for ltl task specifications. Information Systems 133:102571 doi: 10.1016/j.is.2025.102571 CrossRef Google Scholar
[42]	De Giacomo G, De Masellis R, Montali M. 2014. Reasoning on LTL on finite traces: insensitivity to infiniteness. Proceedings of the AAAI Conference on Artificial Intelligence 28:1027−1033 doi: 10.1609/aaai.v28i1.8872 CrossRef Google Scholar
[43]	Kwiatkowska M, Norman G, Parker D. 2011. PRISM 4.0: verification of probabilistic real-time systems. In Computer Aided Verification. CAV 2011. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. pp. 585−591 doi: 10.1007/978-3-642-22110-1_47
[44]	Hensel C, Junges S, Katoen JP, Quatmann T, Volk M. 2022. The probabilistic model checker storm. International Journal on Software Tools for Technology Transfer 24(4):589−610 doi: 10.1007/s10009-021-00633-z CrossRef Google Scholar

{{lists.name}}

Stochastic service composition for ʟᴛʟ_f reachability and safety tasks

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors