Project Title: Critical Analysis of the Use of Redundancy to Achieve Survivability in the Presence of Malicious Attacks
Organization: University of Wisconsin - Milwaukee
AO Number: F165
Contract Number: F30602-97-1-0205
Start Date: 14 APR 1997
End Date: 14 APR 2000

Principal Investigator

Name:

Yvo     Desmedt
Address: Univeristy of Wisconsin Milwaukee, Department of Electrical Engineering and Computer Science
  PO Box 784
City,  State   Zip: Milwaukee,   WI     53201
Phone: (414) 229-6762
Fax: (414) 229-6958
Email: desmedt@cs.uwm.edu
Level Of Participation - Billed: 27%
Level Of Participation - Unbilled: 5%
 

Financial POC

Name:

Robert     Jones
Address: University of Wisconsin-Milwaukwee, D. Mitchell Hall Rm 23
  2407 East Newton Ave.
City,  State   Zip: Milwaukee,   WI     53211
Phone: (414)229-4996
Fax: (414)229-6967
Email: rajones@csd.uwm.edu
 
Project URL: http://www.cs.uwm.edu/~desmedt/survivability.html
Objective: From the research on communication security one learns that, although redundancy has been utilized to achieve reliability, if the errors are caused maliciously the use of redundancy does not necessarily work. The goal is to adapt the lessons from the research on communication security to study when redundancy can and cannot be used to achieve survivability.
Approach: Redundancy has been used to achieve reliability in the context of fault tolerant computation, reliable communication and reliable networks. While reliability is solely concerned with accidental errors, survivability must also deal with malicious faults.

One can distinguish two types of malicious errors. In the first one, the faults are independent, while in the second they are dependent. Examples are now given to illustrate when these assumptions may be realistic in the context of protecting the survivability of computer systems. Suppose that the redundant hardware, algorithms and software used have been developed independently. Then the opponents likely need to develop independent attacks for many of these subsystems to be successful (except if a platform independent attack can be mounted). If, on the other hand, the same software has been replicated, a fault will be duplicated, which implies that the faults are dependent.

Now, when the malicious errors are independent it is reasonable to assume that when dealing with an attack with limited resources, the number of faults are limited. However, such an assumption makes no sense when the faults are strongly dependent (for example when the same faulty software has been replicated).

In the context of communication security redundancy helps when dealing with a limited number of independent faults (using error-correcting codes), but the use of error-detection (or error-correcting) codes does not help when dealing with unlimited dependent errors. However, the use of authentication mechanisms allows one to detect the existence of an unlimited number of malicious faults. One should note that the work of Seberry and Safavi-Naini (IT 1991) has demonstrated that some authentication methods are nothing else than wrapped error-detection codes. The open problem whether and when redundancy helps to achieve survivability can, based on this analogy, be split into two subquestions depending whether the faults are dependent or not.

To answer the first subquestion - whether and when redundancy helps to achieve reliability when the faults (including Byzantine ones) are independent - several mathematical models have been developed. A directed multigraph based model and a monotone graph based model are currently being analyzed (see Recent Accomplishments for details).

When viewing the input to the computation as the "sender" and the final output as a "receiver", one can link the problem of survivable computing with network security. Multiple-(vertex)-connected graphs have been used to achieve network reliability, for example. However, the algorithms developed in this context assume that all vertices (servers) know the graph, which is unrealistic in the scenario of wrapped servers. So, algorithms, if possible, to deal with the case the servers do not know this graph are being developed. One has already proven that these algorithms cannot be extended to a directed multigraph case (see Recent Accomplishments for details).

A main part of the second subquestion is whether replicated computation, possibly faulty, can be wrapped in such a way that one can detect an unlimited number of dependent faults. It is known that this is possible in the communication security context, using authentication methods. (This problem will mainly be studied during the third year of the project, in the context of a very general study on the impact of redundancy to achieve survivability in a malicious environment. This research will start during the second year.)

Recent Accomplishments: The following topics have been studied:

MILESTONE 1: MODELS TO STUDY SURVIVABILITY

Modeling a scenario in which the adversary is malicious should allow for a dynamic topology in which changes in the system may take place without the (non-faulty) processors being aware of it. It should also allow for the most general type of processor which could represent a simple gate, a software package, or a powerful computer. So memory, and the ability to perform complicated operations must be allowed for. The model should also describe the structure of the system at the appropriate level of abstraction: it must distinguish those aspects which are relevant to the computation and abstract out those aspects which are not essential. Such a model should offer the maximum flexibility to the designer. Previous models originated from the network community are unsuitable. First these models assume that the network is known, which is often not the case. Secondly, processors, programs, etc., are more than switching centers. Indeed, they many have more than one type of input.

Based on our analysis of redundant computation systems with multiple inputs, several models have been introduced and analyzed. Specifically, two models for independent faults were introduced: a model based on a "directed multi-graph with colored edges" and a monotone graph model. Two models for dependent faults have also been developed: a monotone graph with colored vertices and a monotone graph with partial orders on the colors of the vertices.

A directed multi-graph with colored edges model: A redundant computation system can be modeled by a directed multi-graph with colored edges. There is at least one input vertex and one output vertex. One assumes that there is at most one edge of any given color which joins distinct vertices. There are several possible applications for this model. For example, processors whose inputs have the same color need only use one input (when there are no faults). If the colors are different then the processor must use one input for each of the input colors, to carry out its computation (or whatever it is supposed to do). For example, the processors of the aviation control system need data from several sensors, as the airplane's speed, position, etc. Of course, this is only one of many possible applications.

A monotone graph model: A monotone graph is defined to be a directed graph with two types of vertices, labeled and-vertices and or-vertices. The graph must have at least one input (source) vertex and one output (sink) vertex. Input vertices may be regarded as and-vertices. This model and the previous one are equivalent in the sense that one can transform a directed multi-graph with colored edges into a monotone graph and vice-versa.

A monotone graph with colored vertices: When the faults are dependent one can model redundant computation by a monotone graph with colored vertices. There are several possible applications for this model. For example, all computers which run Windows 95 could be marked with one color. When a vertex fails, then all vertices with the same color will fail. In a fault-tolerant computation system, all vertices with the same color will have the same failure probability.

A monotone graph with partial orders on the colors of the vertices: The monotone graphs with colored vertices reflect the dependent faults in a natural way. This model however does not focus on the faults which are weakly dependent on one another and therefore it does not describe some of the finer aspects of dependent faults.

In this model one identifies different types of vertices by a color. A color could correspond with an operating system, or with the microprocessor used, etc. This operating system could be replicated and different replications correspond to different vertices in the model.

In many instances there is a hierarchy on the type of failures. For example, if the hardware of a computer has a design flaw, all operating systems that require that hardware may also fail. Also, if the operating system fails all application programs requiring that operating system will fail. So one has an hierarchy of types of vertices. This additional aspect can (more generally) be expressed by using a partial ordering on the colors. So, such a redundant computation system can be modeled by a monotone graph with colored vertices which in addition has a partial order on the colors.

MILESTONE 2: MULTIPLE INPUTS

The monotone graph model has been used to compare the design of reliable computer systems with one type of input versus the case with multiple inputs. While there is an efficient algorithm for using redundant networks (finding vertex disjoint paths in networks), our work shows that the equivalent problem in computation with multiple inputs seems hard (it is NP-hard). It follows that in the general case redundancy alone may not help to achieve survivability.

MILESTONE 3: PREVENTING MALICIOUS ATTACKS

Byzantine type of attacks in the case the graph is unknown have been described in the proposal. The goal of this research is to study when these can be prevented. One assumes that the redundant computation can be modeled by a network.

In the case the sender knows the network, but the receiver does not, the attacks can easily be prevented. The sender basically sends (together with the message and other data) via all paths used the following pair of information: (description of the graph, the paths used). This result was recently published in Electronics Letters.

The case in which each node has a public key and an edge in the unknown graph corresponds with a certificate of the public key is now described. The Byzantine type of attack allowing malicious processors to claim the existence of non-existent nodes or certificates, can also be prevented. An efficient algorithm has been described when the graph is 5/2k+1 connected, where k is the number of faulty nodes (when the graph is only 2k+1 connected an exponential time algorithm has also been found). Several measures are needed to prevent the attack to succeed. The details of the algorithm are described in a scientific paper (submitted).

Other cases are currently under investigation. This part of the research has also an impact on network security.

MILESTONE 4: CRITICAL STUDY OF REDUNDANCY

In traditional reliability, used in reliable network design for example, one has the following result. If the adversary can destroy k vertices, one needs at least k+1 vertices to obtain the desired output. Our result shows that in the case of multi-inputs, it is possible to protect against an adversary who can destroy ck vertices (c a constant) while having only a redundancy factor of k. This work implies that such redundant computation can be made less expensive than one would have estimated with the traditional model of redundant computation.

Current Plan: The current plan is to:
  1. study the potential applications of the models discussed in Recent Accomplishments. For example, these models may be used to identify the most critical tasks in redundant computations (as planned under Milestone 5) and to allocate the available resources to the most critical tasks. Identification of the most critical computer tasks allows one to replicate those in order to reduce the change an enemy can bring these tasks to a halt.
  2. extend the work on Milestone 3 (preventing malicious attacks). Military computer networks are so large and dynamic that routers need to update their knowledge of the network. By attacking this update, the operation of networks can be undermined as described in the proposal. The case where the nodes do not have a public key will be analyzed further to see whether the aforementioned attack can be prevented. This work has also an impact on fault-tolerant networks and network security.
  3. study the multiple input case in more details. Many military computer systems rely on multiple sensors. Redundancy alone is insufficient in this case (see Recent Accomplishments). How can such systems be made survivable?
  4. continue the research on Milestone 4. Since many computers run the same operating system and use the same hardware, replicating them may not eliminate the problem of faults. When such multiple dependent faults occur (for example in the navigation system, and its backups, of a plane), it would be useful for the user (for example the pilot) that the faults be detected (correction is then impossible). A study whether this is feasible will be started.
Technology Transition: A detailed report on the models that were introduced in the first year will be sent by postal mail. Copies of the Electronics Letters paper will also be sent and copies of submitted papers have been sent on a regular basis. The Air Force Research Lab has followed closely the work done on this topic.
Comments / Questions / Anything else you need: None