Operaonal reliability assessment of an aircra environmental control
system
K. Jenab , K. Rashidi
Department of Mechanical and Industrial Engineering, Ryerson University, Toronto, Ontario, Canada M5B 2K3
a r t i c l e i n f o a b s t r a c t
Article history:
Received 29 August 2007
Received in revised form
14 February 2008
Accepted 9 May 2008 Available
online 17 May 2008
Keywords:
Hybrid warm–cold standby system
Human errors
Flow-graph
Maintenance opmizaon
The aircraenvironmental control system (ECS) is composed of several non-idencal and nondedicated subsystems working
as warm–cold standby subsystems. Also, their state transion mes are arbitrary distributed. This paper presents a ow-graph-
based method to calculate me-to-failure data and failure probability of the ECS. The obtained data from the model may be
used for maintenance opmizaon that employs the failure limit strategy for ECS. The model incorporates detectable failures
such as hardware failures, crical human errors, common-cause failures, maintenance categories, and switch acvaon
methods. A numerical example is also presented to demonstrate the applicaon of the model.
& 2008 Elsevier Ltd. All rights reserved.
1. Introduction
In [1], authors developed probabilisc reliability models taking into
account common-cause failure, human errors, and parally energized standby
subsystems. In [2], Using Markovian method, author developed formula for
the availability of the standby system composed of idencal units that are
prevenvely maintained. In [3], authors developed a closed-form equaon for
a k-out-of-n warm standby system with dormant failure. In [4], authors
presented a human errors analysis model with arbitrarily distributed repair
mes for a system composed of two working units and one standby unit. In
[5], using exact distribuon of the sum of two independent beta variables, the
reliability of the standby system composed of units with beta-distributed
lifeme was calculated. In [6], a two-unit standby system was invesgated
wherein the standby unit is put in cold state for a certain amount of me
before it is allowed to become warm. In [7], using Markovian model, authors
analyzed a standby system made up of n units in parallel start operang and
remaining m units are in standby mode. In [8], considering the constant failure
rate, the availability of a standby system with n+1 idencal units and one
standby unit was studied. In [9], several measures of reliability for a two-unit
warm standby system with slow switch considering hardware and human error
failures were assessed by Markovian method. In [10], using the regenerave
point technique in Markov renewal processes, the reliability measures for a
two-unit warm standby system with a slow switch subject to hardware and
human error failures were studied. In [11], assuming imperfect repair, an
opmal geometric model for a cold standby repairable system with only two
idencal units was developed. In [12], a system composed of m operang
units, w warm standby units and R repairmen was studied. In [13], using
reliability techniques, a
stascal method was proposed for prevenve maintenance of a standby
relays in power system. In [14], authors developed imprecise reliability models
of a cold standby system because of unavailability of precise probability
distribuon of the unit mes to failure. In [15], considering a minimal repair
with negligible repair me, a standby system lifeme was studied. In [16], a
warm standby system composed of units with dierent failure and repair rates
was studied. In [17], a cold standby system made up of non-repairable units
with Erlang distribuon lifemes was invesgated. Table 1 classies the
published literature dealt with standby systems subject to human errors,
common cause (CC), and hardware failures (HFs). This classicaon shows that
the published papers overlooked a warm–cold standby system dened as
hybrid standby system with arbitrary failure distribuons. Therefore, the aim
of this study is to invesgate such a standby system, and calculate me-to-
failure data and the system failure probability by using a ow-graph-based
method. The results are required for maintenance opmizaon employing the
failure limit strategy [18]. 2. Problem description
Corresponding author. Tel.: +1416 979 5000x6424; fax: +1416 979 5265. E-mail
address: jenab@ryerson.ca (K. Jenab).
0951-8320/$ - see front maer & 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.ress.2008.05.003
One of the major funcons of the aircra environmental
control system (ECS) is to maintain the temperature of the cabin at passenger
comfort level. This hybrid standby system is made up
Table 1
Classicaon of published literature on standby systems
Failure distribuon
Ref.
Conguraon of standby
Model
With human errors
Exponenal
[1]
Markov
Exponenal
[4]
Cold
Markov
Exponenal
[7]
Warm
Markov
Exponenal
[9]
Warm
Mathemacal expansion
Exponenal
[10]
Warm
Mathemacal expansion
Without human errors
Exponenal
[2]
Cold
Markov/simulaon
Exponenal
[3]
Warm
Mathemacal expansion
Beta
[5]
Cold
Mathemacal expansion
Exponenal
[6]
Hot/warm/cold
Mathemacal expansion
Exponenal
[8]
Cold
Markov
Exponenal
[11]
Mathemacal expansion
Exponenal
[12]
Warm
Mathemacal expansion
Exponenal
[13]
Cold
Mathemacal expansion
Exponenal
[14]
Cold
Mathemacal expansion
Weibull
[15]
Cold
Simulaon
Exponenal
[16]
Warm
Markov
Erlang
[17]
Cold
Mathemacal expansion
of two dedicated cooling packs, and a non-dedicated RAM air as shown in
Fig. 1. The non-dedicated subsystem can be manually acvated only in the
certain ight altude by the ight crew. In the normal operaon, both
cooling packs are automacally operang. In case of one pack failure, the
remaining pack will automacally take over the load and an appropriate
message will be posted on the crew alert system display (CAS) in the aircra
cockpit.
The system state would be reversible if the cooling pack fails due to
overheat. Up to this stage, we have the two-unit warm standby system with
automac switch subject to HFs. In case of both packs failure, crews are
instructed to switch manually to RAM air in accordance with the aircra ight
manual. This stage is a cold standby with slow switch subject to hardware
and crew error failures. The crew errors may result from poor control panel
design, poor work environment, poor task assignment, inadequate training,
poorly wrien ight manual, operang procedures, and deciency of master
minimum equipment list. Therefore, the required maintenance acvies
performed by the ight crews during the ight (Category I) or by
maintenance crews in the bases (Category II) may be associated with human
errors.
To calculate me-to-failure data and the failure probability of such a
system, we dene a warm–cold standby system (hybrid) with related terms
and condions in the remaining part of this secon. The warm–cold standby
system comprises of several nonidencal and non-dedicated subsystems,
which are independently funconing; however, they can be manually or
automacally acvated subject to
(1) manual override opon,
(2) subsystem availability if it is non-dedicated, and
(3) meeng certain ight operaonal requirements such as ight altude.
The dedicated subsystems work in a warm standby conguraon with
automac switch acvaon. On the other hand, the nondedicated
subsystems work in cold standby conguraon with manual acvaon in
accordance with the ight manual. As a maintenance point of view, the
subsystems may be repaired during the ight by the ight crews or at the
base by the maintenance crews based on maintenance procedures. The
maintenance task may be subjected to crical human errors (CHEs) resulng
from poor design, poor work environment, poor
Fig. 1. Environmental control system block diagram.
task assignment, inadequate training, poorly wrien manuals, operang
procedures, and maintenance procedures. Also, there exist CC failures that may
be caused by a common design or material deciency, a common installaon
error, a common maintenance error, or a common harsh environment. This type
of failure leads to total system failure. Third type of failure is HF (i.e., subsystem
failure and switch failure) that can be classied to detectable and non-
detectable HFs. Occurrence of the detectable failure will be announced by
warning, cauon, or advisory message to the CAS display in the aircra cockpit
for taking appropriate acon in accordance with the ight and maintenance
manuals. In fact, the occurrence of the non-detectable failure will remain
dormant unl next maintenance inspecon.
To develop an analycal model for compung me-to-failure data and
failure probability of such a system, the following assumpons are taken into
account:
Dedicated subsystems only serve the warm–cold standby system.
Non-dedicated subsystems may serve the warm–cold standby system subject
to availability and meeng the operaonal requirements.
Category I maintenance task can be performed by the ight crews.
Category II maintenance task requires maintenance crews in site.
Common-cause and HF rates are arbitrarily distributed and stascally
independent.
Human errors, common-cause, and HFs occur independently.
The repair me of the failed subsystems
are arbitrarily distributed.
The failed system repair mes are arbitrarily distributed.
A common-cause failure or a human error can occur and trigger the system
failure from any of its operable states.
Repair is unrestricted for subsystems and system.
The repaired subsystem or system is as good as new.
Switchover mechanism is automac in warm standby and is manual in cold
standby conguraon.
Self-loops in ow-graph that represent the corresponding subsystem are
properly working are s-independent [19].
2.1. Notation
t
transion me due to failures (i.e., hardware, human error,
common cause)
ECS
environmental control system
CAS
crew alert system
CC
common cause
CHE
crical human errors in operaon or maintenance
HW
hardware failure including subsystem and switch
link index i.e., CC, CHE, HW
Pij
probability of transion from node i to node j in link
Pstd
failure probability of the hybrid standby system
Avij
availability of non-dedicated subsystem for switchover in transion
ij and link
MTTF
mean me to failure of the hybrid standby system
STTF
standard deviaon of me to failure of the hybrid standby system
fij(t)
me distribuon funcon for transion from state i to state j with
index
rij(t)
me distribuon funcon for switch acvaon in transion from
state i to state j with index
W
ij
equivalent transion from state i to state j
W
std
equivalent transion of the hybrid standby
system model
L
m
the mth rst-order loop in the model t total
number of disjoint loops in the model
3. Model description
The analycal model is based on the ow-graph concept presented in
Appendix A. The ow graph is made up of several nodes corresponding to
the states of the warm–cold standby ECS system and of links corresponding
to transions among states. The link represented by an arrow is associated
with transion probability, transion me distribuon, switch type
(automac, manual), switch acvaon requirements (availability,
operaonal procedure), and transion descripon. Fig. 2 presents a state
transion without a switch type and the switch acvaon requirements used
for the CC failure and maintenance links. In case of more than one link
between two states, we use the index to dierenate them where the sum
ECS Control
System
Cooling
pack 2
Cooling
pack 1
RAM
Air
Dedicated connection
Non-dedicated Connection
S
i
S
j
Transition Description
of the probabilies for outgoing links is equal to one. The link descripon
indicates either the type of failures (‘CC, crical human errors in ight or
maintenance ‘CHE’, ‘HW’) or categories of maintenance acvies (Category
I, Category II). Category I is a simple maintenance procedure that may be
performed by the ight crews. However, if a maintenance acvity requires
special tools and skills, this acvity falls into Category II, which can be
performed by the maintenance crews in the base.
To calculate the me-to-failure data and the probability of the warm–cold
standby ECS system failure, the equivalent transion (W) of the system can
be derived from the topology equaon [1]:
t t t t t t
1 X Lm þ X Xman Lm Ln X mXanap X Lm Ln Lp þ ¼ 0 (1) m¼1 m¼1 n¼1 m¼1 n¼1 p¼1
where t is total number of disjoint loops in the ECS ow-graph model.
The rst-order loop (i.e., L
m
) is composed of only one loop and the
second-order loop (i.e., L
m
L
n
) is composed of the product of two disjointed
loops m and n. These disjointed loops m and n have no intersecon in their
nodes and links. Similarly, the tth-order loop can be dened as a product of
t disjoint loops. The probability of failure of the system is
Pstd ¼ Wstdjs¼0 (2) and mean me to failure of the system can be
obtained from the below expression:
1 qWstd
MTTF ¼ (3)
Pstd
q
s
s
¼0
Also, standard deviaon me to failure is dened by
Fig. 2. Flow-graph model for a state transion without switch type and acvaon requirements.
1 W
STTF
¼
uuvf
std q
2
q
2
std
½
MTTF
2
(4)
P s s¼0
A developed model for the warm–cold standby ECS system may be
simplied by using series, parallel reducon techniques.
Fig. 3. Flow-graph model for series of state transions.
3.1. Series of sate transitions
Using Eq. (1), the equivalent transion for a series of state transions
depicted in Fig. 3 can be dened as follows:
y
Wxy ¼ Y Pij Z estf ijðtÞdt (5)
ji¼¼iþx1 t
where ¼ 1, x is the starng state and y is the last state in the series.
3.2. Parallel state transitions
Fig. 4 shows parallel transion states between state i and state j that
represents transions because of HFs, CHEs in operaon, maintenance, etc.
The equivalent of this parallel transion can be derived from the topology
equaon as follows:
k
Wij ¼ X Pij Z estf ijðtÞdt (6)
¼1
t
where is the index for link between state i and state j.
Similarly, we can extend the method for a link associated with switch type
and switch acvaon requirements shown in Fig. 5. The transion me is a
random variable made up of two independent random variables (i.e., switch
me (t
1
) and acvaon me (t
2
)).
Because transion me and acvaon me are independent, the total
transion me distribuon funcon can be dened as follows:
gijðt1; t2Þ ¼ f ijðt1Þrijðt2Þ (7)
Therefore, the simple form of the link is given in Fig. 6.
Similarly, using the approach for transion without switch type and
acvaon requirements, the system failure probability, MTTF and STTF can be
calculated for the warm–cold standby system with switch type and acvaon
requirements.
4. Numerical illustration
Consider an aircra ECS presented in Fig. 1 that maintains the temperature
of the cabin at the passenger comfort level. This warm–cold standby ECS
system is made up of two dedicated cooling packs and a non-dedicated RAM
air funconing as warm–cold standby subsystems. In the normal operaon,
the cooling packs are automacally operang. However, ight crews always
S
x
S
y
S
x+1
S
y-1
Transition Description
Transition Description
have manual override opon. In case of one pack failure, the warm standby
cooling pack can be instantaneously acvated. Subsequently, if the remaining
pack fails, RAM air can be automacally acvated if the ight altude is at or
below 10,000 feet. Otherwise, ight crews may switch manually to the cold
standby RAM air subsystem. The manual switch acvaon requires certain
procedure, which is me consuming. The required switch me is normally
distributed with mean me 0.01 h. Also, in both switching cases, the required
me to change the ight altude is normally distributed with mean 0.05 h. If
a pack fails due to overheat, the ight crews may repair the pack during ight.
The mean me to repair the overheat pack is normally distributed with mean
me 2 h. Also, it is assumed that all normal distribuons have zero standard
deviaon. The ight crew errors or maintenance crew errors in repairing the
pack may result in changing the state of the system. Fig. 7 presents the ow-
graph model for the warm–cold standby ECS including four states and all
transions with their associated parameters.
In state 1, the cooling packs work with reliability 0.92, mean me to failure
100 h shown with a self-loop. State 2 shows that the ECS is vulnerably working
with just one cooling pack. State 3 indicates that the RAM air standby
subsystem with reliability 0.7 and the mean normal lifeme 500 h is acvated.
Finally, the RAM air failure due to CC, CHEs, or HF leads to the ECS system
failure in State 4 (Fig. 8).
Using step-wise reducon techniques and Eq. (1), we have the equivalent
transion (W) from State 1 to State 4 as follows: e þ bcd þ bf eðcj þ g þ hÞ bfh
þ egh
W ¼
1 bi cj bck a g h þ bih þ cja þ aðg þ hÞ þ ghð1 aÞ
(8)
where
Fig. 4. Flow-graph model for series of state transions. and acvaon requirements.
Fig. 5. Flow-graph model for a state transion with switch type and acvaon requirements.
Switch type/Activation requirements
Transition Description
S
i
S
j
Fig. 7. Flow-graph model for a simple version of ECS.
Fig. 8. Reduced ow-graph model for Fig. 7.
d ¼ 0:08ðe0:001s þ e0:01sÞjs¼0 ¼ 0:16
e ¼ 0:08e0:001sjs¼0 ¼ 0:08 f ¼ 0:08e0:001sjs¼0 ¼ 0:08
g ¼ 0:8e100sjs¼0 ¼ 0:8 h ¼
0:7e500sjs¼0 ¼ 0:7 i ¼
0:02e
2
s
j
s
¼0
¼ 0:02 j ¼
0:01e
2
s
j
s
¼0
¼ 0:01
k ¼ 0:04e
14
s
j
s
¼0
¼ 0:04
Using Eqs. (2)–(4), the mean me to failure of the system is
1078 ight hours. Also, standard deviaon of me to failure is 1182 ight hours.
Performing sensivity analysis, Fig. 9 depicts the relaonship of the cooling pack
and the RAM air reliability with the mean me to failure of the ECS. The
reliability improvement of two warm–cold standby subsystems from 0.89 to
0.98 can improve the mean to failure. However, reducing the probability of CC,
CHEs, and HF links has no signicant eect in improving MTTF.
5. Conclusion
This study focuses on the warm–cold ECS standby system composed of
non-idencal and (non)-dedicated standby subsystems that may be acvated
through automac and manual switches under certain acvaon
requirements. Contrary to dedicated subsystems, non-dedicated subsystems
are not general part of the ECS system. They can only be acvated if they are
not serving other systems. Therefore, their acvaon method is manual
switch subject to meeng certain ight operaonal requirements.
Furthermore, because human plays a vital role in preparing the situaon for
switchover and maintenance acvies, human errors may cause the system
degradaon or failure. Therefore, we consider three types of failure including
CCs, human errors, and HFs in the developed model. Also, maintenance
acvies are classied into two categories. Category I refers to the repair of
the subsystem by the ight crew during the operaon me. Category II refers
to the repair acvity performed by maintenance crews in the next
maintenance inspecon in site.
The developed model for calculang the me-to-failure data and the
probability of the ECS system failure is based on the owgraph concept. The
model is made up of nodes and links represenng states and transions
among states with arbitrary me distribuon funcon. Using reducon
techniques and topology equaon, the equivalent transion (W) from the
starng state to the end state can be obtained. Using Eqs. (2)–(4), we can
calculate the system failure probability, MTTF, and STTF. These data can be
used for maintenance opmizaon based on limited failure strategy. Also, we
can perform sensivity analysis for MTTF and STTF of the ECS system that
can be obtained from Eqs. (3) and
S
1
S
4
W
Equivalent transition for state 1 to state 4
1
/
W
Fig. 10. Flow graph.
(4) based on changing the reliability of the subsystems, and transion
elements. For future work, one may extend the mullayer ow-graph for
evaluang dierent reliability scenarios.
Acknowledgments
The authors would like to express their sincere appreciaon to
anonymous referees for the construcve comments, which enhanced the
quality of the paper.
Appendix A
Fig. 10 presents a ow-graph model composed of nodes and links
associated with transmiance (i.e., a, b, etc.) represenng states, transion
process and transion parameters, respecvely.
This closed ow graph has the following basic properes:
only one start node,
only one end node,
at least one path from the start to the end nodes,
topological equaon describing the relaonship between path
transmiance of a closed ow graph is equal to zero.
In Fig. 10, the ow graph has ve nodes (state) and seven links (transion). If
one considers a dummy link from end node ‘E’ to start node ‘S’ with
transmiance 1/W, where W is the equivalent of the ow graph, the ow graph
becomes a closed ow graph. The ow graph has the following four loops:
transmiance of loop I (L
1
) ¼ abf/W,
transmiance of loop II (L
2
) ¼ abeg/W,
transmiance of loop III (L
3
) ¼ adg/W,
transmiance of loop IV (L
4
) ¼ c.
To nd the equivalent of the ow graph, the topological Eq. (1) must be equal
to zero:
TP ¼ 1 X L
i
þ X X L
i
L
j
X X X L
i
L
j
L
k
þ ¼ 0
i¼1 i j i j k
where the rst-order loop (i.e., L
i
) is composed of only one loop and the second-
order loop (i.e., L
i
L
j
) is composed of the product of two disjointed loop ‘i’ and
j. These disjointed loopsiand j’ have no intersecon in their nodes and links.
Similarly, the n’thorder loop can be dened as a product of n disjoint loops.
For example, the transmiance of the rst-order loops in Fig. 10 is given by
4
abf abeg adg
L
i
¼ þ þ þ c (9) W W W
i¼1
and there is no second-order loop, because the loops are not disjointed.
Thus, by substung the transmiance of the rst-order loop into the TP Eq.
(1), and seng the remaining terms equal to zero, we obtain
abf abeg adg
TP ¼ 1 þ þ þ c (10) W W W
By equang TP equaon equal to zero, we obtain
abf þ abeg þ adg
W ¼
(11)
1 c
Therefore, the equivalent of the ow graph in Fig. 10 is a singlenode ow
graph with the transmiance of Eq. (11) where each link transmiance is the
product of its probability and its momentum generang funcon. For example,
Reliability of the subsystem
Fig. 9. Mean me to failure of ECS.
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Cooling packs
RAM Air
0.91
0.93
0.94
0.95
0.96
0.97
0.98
0.89
0.9
0.92
S
E
3
2
1
a
d
b
g
e
f
c
1
/W
in Fig. 2, the transmiance of the link between node ‘2’ and node ‘3’ denoted
by ‘e’ is equal to P
23
Rt e
st
f
23
ðtÞdt.
References
[1] Dhillon BS, Rayapa SN. Common-cause failure and human error modeling of redundant
systems with parally energized standby units. Reliab Eng 1987;19:1–14.
[2] Aven T. Availability formulae for standby systems of similar units that are prevenvely
maintained. IEEE Trans Reliab 1990;39(5):603–6.
[3] She J, Pecht MG. Reliability of k-out-of-n warm standby system. IEEE Trans Reliab
1992;1(2):50–9.
[4] Dhillon BS, Yang N. Human error analysis of a standby redundant system with arbitrarily
distributed repair mes. Microelectron Reliab 1993;33(3):431–44.
[5] Pham TG, Turkkan N. Reliability of a standby system with beta-distributed component lives.
IEEE Trans Reliab 1994;43(1):71–5.
[6] Subramanian R, Anantharaman V. Reliability analysis of a complex standby redundant
system. Reliab Eng Syst Saf 1995;45:57–70.
[7] Dhillon BS, Yang N. Probabilisc analysis of a maintainable system with human error. J Qual
Maint Eng 1995;1(2):50–9.
[8] Aven T, Optal K. On the steady state unavailability of standby systems. Reliab Eng Syst Saf
1996;52:171–5.
[9] Sridharan V, Mohanavadivu P. Some stascal characteriscs of a repairable, standby,
human and machine system. IEEE Trans Reliab 1998;47(4): 4315.
[10] Mahmoud MAW, Esmail MA. Stochasc analysis of a two-unit warm standby system with
slow switch subject to hardware and human error failures. Microelectron Reliab
1998;38:1639–44.
[11] Zhong YL. An opmal geometric process model for a cold standby repairable system. Reliab
Eng Syst Saf 1999;63:107–10.
[12] Ke J, Wang K. The reliability analysis of balking and reneging in a repairable system with
warm standbys. Qual Reliab Eng Int 2002;18:467–78.
[13] Moa SB, Colosimo EA. Determinaon of prevenve maintenance periodicies of standby
devices. Reliab Eng Syst Saf 2002;76:149–54.
[14] Utkin LV. Imprecise reliability of cold standby systems. Int J Qual Reliab Manage
2003;20(6):722–39.
[15] Seo JH, Jang JS, Ba DS. Lifeme and reliability esmaon of repairable redundant system
subject to periodic alternaon. Reliab Eng Syst Saf 2003; 80:197–204.
[16] Zhang T, Xie M, Horigome M. Availability and reliability of k-out-of-(M+N)-G warm standby
systems. Reliab Eng Syst Saf 2006;20(6):722–39.
[17] Azaron A, Katagiri H, Kato K, Sakawa M. Reliability evaluaon of mulcomponent cold-
standby redundant systems. Appl Math Comput
2006;173:137–49.
[18] Jayabalan V, Chaudhuri D. Opmal maintenance and replacement policy for deteriorang
system with increased mean downme. Naval Res Logiscs 1992;39:67–78.
[19] Pritsker AAB, Happ WW. GERT: Graphical evaluaon and review technique:
Part I: fundamental. J Industrial Eng 1966;17(5):267–74.

Preview text:


Operational reliability assessment of an aircraft environmental control system K. Jenab , K. Rashidi
Department of Mechanical and Industrial Engineering, Ryerson University, Toronto, Ontario, Canada M5B 2K3 a r t i c l e i n f o a b s t r a c t Article history:
The aircraft environmental control system (ECS) is composed of several non-identical and nondedicated subsystems working Received 29 August 2007
as warm–cold standby subsystems. Also, their state transition times are arbitrary distributed. This paper presents a flow-graph- Received in revised form
based method to calculate time-to-failure data and failure probability of the ECS. The obtained data from the model may be 14 February 2008
used for maintenance optimization that employs the failure limit strategy for ECS. The model incorporates detectable failures Accepted 9 May 2008 Available
such as hardware failures, critical human errors, common-cause failures, maintenance categories, and switch activation online 17 May 2008
methods. A numerical example is also presented to demonstrate the application of the model.
& 2008 Elsevier Ltd. All rights reserved. Keywords:
Hybrid warm–cold standby system Human errors Flow-graph Maintenance optimization 1. Introduction
human error failures were studied. In [11], assuming imperfect repair, an
In [1], authors developed probabilistic reliability models taking into
optimal geometric model for a cold standby repairable system with only two
account common-cause failure, human errors, and partially energized standby
identical units was developed. In [12], a system composed of m operating
subsystems. In [2], Using Markovian method, author developed formula for
units, w warm standby units and R repairmen was studied. In [13], using
the availability of the standby system composed of identical units that are reliability techniques, a
preventively maintained. In [3], authors developed a closed-form equation for
statistical method was proposed for preventive maintenance of a standby
a k-out-of-n warm standby system with dormant failure. In [4], authors
relays in power system. In [14], authors developed imprecise reliability models
presented a human errors analysis model with arbitrarily distributed repair
of a cold standby system because of unavailability of precise probability
times for a system composed of two working units and one standby unit. In
distribution of the unit times to failure. In [15], considering a minimal repair
[5], using exact distribution of the sum of two independent beta variables, the
with negligible repair time, a standby system lifetime was studied. In [16], a
reliability of the standby system composed of units with beta-distributed
warm standby system composed of units with different failure and repair rates
lifetime was calculated. In [6], a two-unit standby system was investigated
was studied. In [17], a cold standby system made up of non-repairable units
wherein the standby unit is put in cold state for a certain amount of time
with Erlang distribution lifetimes was investigated. Table 1 classifies the
before it is allowed to become warm. In [7], using Markovian model, authors
published literature dealt with standby systems subject to human errors,
analyzed a standby system made up of n units in parallel start operating and
common cause (CC), and hardware failures (HFs). This classification shows that
remaining m units are in standby mode. In [8], considering the constant failure
the published papers overlooked a warm–cold standby system defined as
rate, the availability of a standby system with n+1 identical units and one
hybrid standby system with arbitrary failure distributions. Therefore, the aim
standby unit was studied. In [9], several measures of reliability for a two-unit
of this study is to investigate such a standby system, and calculate time-to-
warm standby system with slow switch considering hardware and human error
failure data and the system failure probability by using a flow-graph-based
failures were assessed by Markovian method. In [10], using the regenerative
method. The results are required for maintenance optimization employing the
point technique in Markov renewal processes, the reliability measures for a
failure limit strategy [18]. 2. Problem description
two-unit warm standby system with a slow switch subject to hardware and
Corresponding author. Tel.: +1416 979 5000x6424; fax: +1416 979 5265. E-mail
address: jenab@ryerson.ca (K. Jenab).
0951-8320/$ - see front matter & 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.ress.2008.05.003
One of the major functions of the aircraft environmental
control system (ECS) is to maintain the temperature of the cabin at passenger
comfort level. This hybrid standby system is made up Table 1
Classification of published literature on standby systems Failure distribution Ref. Model Configuration of standby With human errors Exponential [1] Markov Exponential [4] Cold Markov Exponential [7] Warm Markov Exponential [9] Warm Mathematical expansion Exponential [10] Warm Mathematical expansion Without human errors Exponential [2] Cold Markov/simulation Exponential [3] Warm Mathematical expansion Beta [5] Cold Mathematical expansion Exponential [6] Hot/warm/cold Mathematical expansion Exponential [8] Cold Markov Exponential [11] Mathematical expansion Exponential [12] Warm Mathematical expansion Exponential [13] Cold Mathematical expansion Exponential [14] Cold Mathematical expansion Weibull [15] Cold Simulation Exponential [16] Warm Markov Erlang [17] Cold Mathematical expansion
of two dedicated cooling packs, and a non-dedicated RAM air as shown in
instructed to switch manually to RAM air in accordance with the aircraft flight
Fig. 1. The non-dedicated subsystem can be manually activated only in the
manual. This stage is a cold standby with slow switch subject to hardware
certain flight altitude by the flight crew. In the normal operation, both
and crew error failures. The crew errors may result from poor control panel
cooling packs are automatically operating. In case of one pack failure, the
design, poor work environment, poor task assignment, inadequate training,
remaining pack will automatically take over the load and an appropriate
poorly written flight manual, operating procedures, and deficiency of master
message will be posted on the crew alert system display (CAS) in the aircraft
minimum equipment list. Therefore, the required maintenance activities cockpit.
performed by the flight crews during the flight (Category I) or by
The system state would be reversible if the cooling pack fails due to
maintenance crews in the bases (Category II) may be associated with human
overheat. Up to this stage, we have the two-unit warm standby system with errors.
automatic switch subject to HFs. In case of both packs failure, crews are
To calculate time-to-failure data and the failure probability of such a
Category I maintenance task can be performed by the flight crews.
system, we define a warm–cold standby system (hybrid) with related terms
Category II maintenance task requires maintenance crews in site.
and conditions in the remaining part of this section. The warm–cold standby
Common-cause and HF rates are arbitrarily distributed and statistically
system comprises of several nonidentical and non-dedicated subsystems, independent.
which are independently functioning; however, they can be manually or
Human errors, common-cause, and HFs occur independently.
automatically activated subject to The repair time of the failed subsystems are arbitrarily distributed. (1) manual override option,
The failed system repair times are arbitrarily distributed.
(2) subsystem availability if it is non-dedicated, and
A common-cause failure or a human error can occur and trigger the system
(3) meeting certain flight operational requirements such as flight altitude.
failure from any of its operable states.
Repair is unrestricted for subsystems and system.
The repaired subsystem or system is as good as new.
The dedicated subsystems work in a warm standby configuration with
Switchover mechanism is automatic in warm standby and is manual in cold
automatic switch activation. On the other hand, the nondedicated standby configuration.
subsystems work in cold standby configuration with manual activation in
Self-loops in flow-graph that represent the corresponding subsystem are
accordance with the flight manual. As a maintenance point of view, the
properly working are s-independent [19].
subsystems may be repaired during the flight by the flight crews or at the
base by the maintenance crews based on maintenance procedures. The
maintenance task may be subjected to critical human errors (CHEs) resulting 2.1. Notation
from poor design, poor work environment, poor t
transition time due to failures (i.e., hardware, human error, common cause) ECS environmental control system CAS crew alert system CC common cause CHE
critical human errors in operation or maintenance HW
hardware failure including subsystem and switch link index i.e., CC, CHE, HW Pij‘
probability of transition from node i to node j in link Pstd
failure probability of the hybrid standby system Avij‘
availability of non-dedicated subsystem for switchover in transition ij and link MTTF
mean time to failure of the hybrid standby system ECS Control Cooling Cooling STTF
standard deviation of time to failure of the hybrid standby system System pack 2 pack 1 fij‘(t)
time distribution function for transition from state i to state j with index rij‘(t)
time distribution function for switch activation in transition from Dedicated connection RAM
state i to state j with index Non-dedicated Connection Air Wij
equivalent transition from state i to state j Wstd equivalent transition of the hybrid standby system model
Fig. 1. Environmental control system block diagram.
task assignment, inadequate training, poorly written manuals, operating S i S j
procedures, and maintenance procedures. Also, there exist CC failures that may Transition Description
be caused by a common design or material deficiency, a common installation
error, a common maintenance error, or a common harsh environment. This type Lm
the mth first-order loop in the model t total
of failure leads to total system failure. Third type of failure is HF (i.e., subsystem
number of disjoint loops in the model
failure and switch failure) that can be classified to detectable and non-
detectable HFs. Occurrence of the detectable failure will be announced by
warning, caution, or advisory message to the CAS display in the aircraft cockpit 3. Model description
for taking appropriate action in accordance with the flight and maintenance
manuals. In fact, the occurrence of the non-detectable failure will remain
The analytical model is based on the flow-graph concept presented in
dormant until next maintenance inspection.
Appendix A. The flow graph is made up of several nodes corresponding to
To develop an analytical model for computing time-to-failure data and
the states of the warm–cold standby ECS system and of links corresponding
failure probability of such a system, the following assumptions are taken into
to transitions among states. The link represented by an arrow is associated account:
with transition probability, transition time distribution, switch type
(automatic, manual), switch activation requirements (availability,
Dedicated subsystems only serve the warm–cold standby system.
operational procedure), and transition description. Fig. 2 presents a state
Non-dedicated subsystems may serve the warm–cold standby system subject
transition without a switch type and the switch activation requirements used
to availability and meeting the operational requirements.
for the CC failure and maintenance links. In case of more than one link
between two states, we use the index to differentiate them where the sum
of the probabilities for outgoing links is equal to one. The link description
The first-order loop (i.e., Lm) is composed of only one loop and the
indicates either the type of failures (‘CC’, critical human errors in flight or
second-order loop (i.e., Lm Ln) is composed of the product of two disjointed
maintenance ‘CHE’, ‘HW’) or categories of maintenance activities (Category
loops m and n. These disjointed loops m and n have no intersection in their
I, Category II). Category I is a simple maintenance procedure that may be
nodes and links. Similarly, the tth-order loop can be defined as a product of
performed by the flight crews. However, if a maintenance activity requires
t disjoint loops. The probability of failure of the system is
special tools and skills, this activity falls into Category II, which can be
performed by the maintenance crews in the base.
Pstd ¼ Wstdjs¼0
(2) and mean time to failure of the system can be
To calculate the time-to-failure data and the probability of the warm–cold
obtained from the below expression:
standby ECS system failure, the equivalent transition (W) of the system can
be derived from the topology equation [1]: 1 qWstd t t t t t t MTTF ¼ (3)
1 X Lm þ X Xman Lm Ln X mXanap X Lm Ln Lp þ ¼ 0 (1) m¼1 m¼1 n¼1 m¼1 n¼1 p¼1 P q std s s¼0
Also, standard deviation time to failure is defined by
where t is total number of disjoint loops in the ECS flow-graph model.
Fig. 2. Flow-graph model for a state transition without switch type and activation requirements. 1 W STTF ¼
uuvtffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffif
fiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffistd q2q std 2 ½MTTF2ffi (4) P s s¼0
A developed model for the warm–cold standby ECS system may be
simplified by using series, parallel reduction techniques. S S x x+1 S S y-1 y Transition Description Transition Description
Fig. 3. Flow-graph model for series of state transitions. 3.1.
Series of sate transitions t ¼1
Using Eq. (1), the equivalent transition for a series of state transitions
where is the index for link between state i and state j.
depicted in Fig. 3 can be defined as follows:
Similarly, we can extend the method for a link associated with switch type
and switch activation requirements shown in Fig. 5. The transition time is a y
random variable made up of two independent random variables (i.e., switch
time (t1) and activation time (t2)).
Wxy ¼ Y P‘ij Z
estf ‘ijðtÞdt (5)
Because transition time and activation time are independent, the total
transition time distribution function can be defined as follows:
ji¼¼iþx1 t
g‘ijðt1; t2Þ ¼ f ‘ijðtr‘ijðt2Þ (7)
where ¼ 1, x is the starting state and y is the last state in the series.
Therefore, the simple form of the link is given in Fig. 6.
Similarly, using the approach for transition without switch type and 3.2.
Parallel state transitions
activation requirements, the system failure probability, MTTF and STTF can be
calculated for the warm–cold standby system with switch type and activation requirements.
Fig. 4 shows parallel transition states between state i and state j that
4. Numerical illustration
represents transitions because of HFs, CHEs in operation, maintenance, etc.
The equivalent of this parallel transition can be derived from the topology
Consider an aircraft ECS presented in Fig. 1 that maintains the temperature equation as follows:
of the cabin at the passenger comfort level. This warm–cold standby ECS k
system is made up of two dedicated cooling packs and a non-dedicated RAM
air functioning as warm–cold standby subsystems. In the normal operation,
Wij ¼ X P‘ij Z
estf ‘ijðtÞdt (6)
the cooling packs are automatically operating. However, flight crews always
have manual override option. In case of one pack failure, the warm standby
In state 1, the cooling packs work with reliability 0.92, mean time to failure
cooling pack can be instantaneously activated. Subsequently, if the remaining
100 h shown with a self-loop. State 2 shows that the ECS is vulnerably working
pack fails, RAM air can be automatically activated if the flight altitude is at or
with just one cooling pack. State 3 indicates that the RAM air standby
below 10,000 feet. Otherwise, flight crews may switch manually to the cold
subsystem with reliability 0.7 and the mean normal lifetime 500 h is activated.
standby RAM air subsystem. The manual switch activation requires certain
Finally, the RAM air failure due to CC, CHEs, or HF leads to the ECS system
procedure, which is time consuming. The required switch time is normally failure in State 4 (Fig. 8).
distributed with mean time 0.01 h. Also, in both switching cases, the required
Using step-wise reduction techniques and Eq. (1), we have the equivalent
time to change the flight altitude is normally distributed with mean 0.05 h. If
transition (W) from State 1 to State 4 as follows: e þ bcd þ bf eðcj þ g þ hÞ bfh
a pack fails due to overheat, the flight crews may repair the pack during flight.
The mean time to repair the overheat pack is normally distributed with mean þ egh
time 2 h. Also, it is assumed that all normal distributions have zero standard
deviation. The flight crew errors or maintenance crew errors in repairing the W ¼
pack may result in changing the state of the system. Fig. 7 presents the flow-
1 bi cj bck a g h þ bih þ cja þ aðg þ hÞ þ ghð1 aÞ
graph model for the warm–cold standby ECS including four states and all (8)
transitions with their associated parameters. where
Fig. 4. Flow-graph model for series of state transitions. and activation requirements. S i S j
Switch type/Activation requirements Transition Description
Fig. 5. Flow-graph model for a state transition with switch type and activation requirements.
Fig. 7. Flow-graph model for a simple version of ECS. 5. Conclusion W S 1 S 4
This study focuses on the warm–cold ECS standby system composed of
Equivalent transition for state 1 to state 4
non-identical and (non)-dedicated standby subsystems that may be activated
through automatic and manual switches under certain activation 1 / W
requirements. Contrary to dedicated subsystems, non-dedicated subsystems
are not general part of the ECS system. They can only be activated if they are
Fig. 8. Reduced flow-graph model for Fig. 7.
not serving other systems. Therefore, their activation method is manual
switch subject to meeting certain flight operational requirements.
Furthermore, because human plays a vital role in preparing the situation for
d ¼ 0:08ðe
switchover and maintenance activities, human errors may cause the system
0:001s þ e0:01sÞjs¼0 ¼ 0:16
degradation or failure. Therefore, we consider three types of failure including
e ¼ 0:08e0:001sjs¼0 ¼ 0:08 f ¼ 0:08e0:001sjs¼0 ¼ 0:08
CCs, human errors, and HFs in the developed model. Also, maintenance
activities are classified into two categories. Category I refers to the repair of
g ¼ 0:8e100sjs¼0 ¼ 0:8 h ¼
the subsystem by the flight crew during the operation time. Category II refers
0:7e500sjs¼0 ¼ 0:7 i ¼
to the repair activity performed by maintenance crews in the next
maintenance inspection in site.
0:02e2sjs¼0 ¼ 0:02 j ¼
The developed model for calculating the time-to-failure data and the 0:01e2sj
probability of the ECS system failure is based on the flowgraph concept. The s¼0 ¼ 0:01
model is made up of nodes and links representing states and transitions
k ¼ 0:04e14sjs¼0 ¼ 0:04
among states with arbitrary time distribution function. Using reduction
techniques and topology equation, the equivalent transition (W) from the
Using Eqs. (2)–(4), the mean time to failure of the system is
starting state to the end state can be obtained. Using Eqs. (2)–(4), we can
1078 flight hours. Also, standard deviation of time to failure is 1182 flight hours.
calculate the system failure probability, MTTF, and STTF. These data can be
Performing sensitivity analysis, Fig. 9 depicts the relationship of the cooling pack
used for maintenance optimization based on limited failure strategy. Also, we
and the RAM air reliability with the mean time to failure of the ECS. The
can perform sensitivity analysis for MTTF and STTF of the ECS system that
reliability improvement of two warm–cold standby subsystems from 0.89 to
can be obtained from Eqs. (3) and
0.98 can improve the mean to failure. However, reducing the probability of CC,
CHEs, and HF links has no significant effect in improving MTTF. c
In Fig. 10, the flow graph has five nodes (state) and seven links (transition). If
one considers a dummy link from end node ‘E’ to start node ‘S’ with
transmittance 1/W, where W is the equivalent of the flow graph, the flow graph
becomes a closed flow graph. The flow graph has the following four loops: 1 b 2
transmittance of loop I (L1) ¼ abf/W,
transmittance of loop II (L2) ¼ abeg/W, a d e f
transmittance of loop III (L3) ¼ adg/W,
transmittance of loop IV (L4) ¼ c. S 3 g E
To find the equivalent of the flow graph, the topological Eq. (1) must be equal to zero: 1 /W 100000 Cooling packs RAM Air 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 Reliability of the subsystem
Fig. 9. Mean time to failure of ECS. Fig. 10. Flow graph.
TP ¼ 1 X Li þ X X Li Lj X X X Li Lj Lk þ ¼ 0 i¼1 i j i j k
(4) based on changing the reliability of the subsystems, and transition
elements. For future work, one may extend the multilayer flow-graph for
where the first-order loop (i.e., Li) is composed of only one loop and the second-
evaluating different reliability scenarios.
order loop (i.e., Li Lj) is composed of the product of two disjointed loop ‘i’ and
j’. These disjointed loops ‘i’ and ‘j’ have no intersection in their nodes and links.
Similarly, the ‘n’thorder loop can be defined as a product of ‘n’ disjoint loops. Acknowledgments
For example, the transmittance of the first-order loops in Fig. 10 is given by 4
The authors would like to express their sincere appreciation to abf abeg adg
anonymous referees for the constructive comments, which enhanced the
Li ¼ þ þ þ c (9) W W W quality of the paper. i¼1
and there is no second-order loop, because the loops are not disjointed.
Thus, by substituting the transmittance of the first-order loop into the TP Eq. Appendix A
(1), and setting the remaining terms equal to zero, we obtain abf abeg adg
Fig. 10 presents a flow-graph model composed of nodes and links
associated with transmittance (i.e., a, b, etc.) representing states, transition
TP ¼ 1 þ þ þ c (10) W W W
process and transition parameters, respectively.
By equating TP equation equal to zero, we obtain
This closed flow graph has the following basic properties:
abf þ abeg þ adg only one start node, W ¼ (11) only one end node, 1 c
at least one path from the start to the end nodes,
Therefore, the equivalent of the flow graph in Fig. 10 is a singlenode flow
topological equation describing the relationship between path
graph with the transmittance of Eq. (11) where each link transmittance is the
transmittance of a closed flow graph is equal to zero.
product of its probability and its momentum generating function. For example,
in Fig. 2, the transmittance of the link between node ‘2’ and node ‘3’ denoted
by ‘e’ is equal to P23 Rt estf23ðtÞdt. References
[1] Dhillon BS, Rayapati SN. Common-cause failure and human error modeling of redundant
systems with partially energized standby units. Reliab Eng 1987;19:1–14.
[2] Aven T. Availability formulae for standby systems of similar units that are preventively
maintained. IEEE Trans Reliab 1990;39(5):603–6.
[3] She J, Pecht MG. Reliability of k-out-of-n warm standby system. IEEE Trans Reliab 1992;1(2):50–9.
[4] Dhillon BS, Yang N. Human error analysis of a standby redundant system with arbitrarily
distributed repair times. Microelectron Reliab 1993;33(3):431–44.
[5] Pham TG, Turkkan N. Reliability of a standby system with beta-distributed component lives.
IEEE Trans Reliab 1994;43(1):71–5.
[6] Subramanian R, Anantharaman V. Reliability analysis of a complex standby redundant
system. Reliab Eng Syst Saf 1995;45:57–70.
[7] Dhillon BS, Yang N. Probabilistic analysis of a maintainable system with human error. J Qual Maint Eng 1995;1(2):50–9.
[8] Aven T, Optal K. On the steady state unavailability of standby systems. Reliab Eng Syst Saf 1996;52:171–5.
[9] Sridharan V, Mohanavadivu P. Some statistical characteristics of a repairable, standby,
human and machine system. IEEE Trans Reliab 1998;47(4): 431–5.
[10] Mahmoud MAW, Esmail MA. Stochastic analysis of a two-unit warm standby system with
slow switch subject to hardware and human error failures. Microelectron Reliab 1998;38:1639–44.
[11] Zhong YL. An optimal geometric process model for a cold standby repairable system. Reliab
Eng Syst Saf 1999;63:107–10.
[12] Ke J, Wang K. The reliability analysis of balking and reneging in a repairable system with
warm standbys. Qual Reliab Eng Int 2002;18:467–78.
[13] Motta SB, Colosimo EA. Determination of preventive maintenance periodicities of standby
devices. Reliab Eng Syst Saf 2002;76:149–54.
[14] Utkin LV. Imprecise reliability of cold standby systems. Int J Qual Reliab Manage 2003;20(6):722–39.
[15] Seo JH, Jang JS, Ba DS. Lifetime and reliability estimation of repairable redundant system
subject to periodic alternation. Reliab Eng Syst Saf 2003; 80:197–204.
[16] Zhang T, Xie M, Horigome M. Availability and reliability of k-out-of-(M+N)-G warm standby
systems. Reliab Eng Syst Saf 2006;20(6):722–39.
[17] Azaron A, Katagiri H, Kato K, Sakawa M. Reliability evaluation of multicomponent cold-
standby redundant systems. Appl Math Comput 2006;173:137–49.
[18] Jayabalan V, Chaudhuri D. Optimal maintenance and replacement policy for deteriorating
system with increased mean downtime. Naval Res Logistics 1992;39:67–78.
[19] Pritsker AAB, Happ WW. GERT: Graphical evaluation and review technique:
Part I: fundamental. J Industrial Eng 1966;17(5):267–74.