Algorithms: From Theory to Application
Guy E. Blelloch, Avrim L. Blum, Manuel Blum,
John D. La?erty and Daniel D. Sleator
Carnegie Mellon University
1
Overview
With the explosion in use and connectivity of computers and in the sizes of data sets being
collected, mathematically sophisticated algorithms are becoming increasingly crucial for a
wide range of real-world applications. These include problems in computational science,
data analysis, mathematical modeling, and communication services. In fact, sophisticated
algorithms already play a central role in a number of domains. Shortest path and interior
point methods are used in airline scheduling. String matching, traveling salesman and max-
?ow algorithms are used in computational biology. Algorithms for graph separators, maximal
independent set, Delaunay triangulation, and point location are used in ?nite element codes
for designing airplanes, cars, and space structures. Sophisticated algorithms are used for
managing information ?ow on the internet. The need for powerful algorithmic techniques in
a wide range of applications can only be expected to increase.
The transition from theory to practice, however, has been slow. It often takes twenty years
for an algorithm to make it from theory into an application, and there are still many cases
of naive algorithms being used where a better algorithm would surely improve performance
and productivity. Furthermore, similar algorithms or techniques are often reinvented many
times over in di?erent application areas.
Because of the need to accelerate the transition of ideas from theory to application and
back, and the need to better share ideas and code for basic algorithms and data structures
among diverse applications, we believe it is critical to support a broad-based research pro-
gram that focuses on algorithms and their transition to applications. This research would
span several of the application areas outlined in the ITR program solicitation, including in-
formation management, software, advanced computational science, and scalable information
infrastructure.
The core of this proposal is such a broad, coordinated e?ort aimed at the transition of
algorithms from Theory to Applications. The goals of this e?ort are to develop algorithms
and infrastructure in an integrated manner, and to have impact at a number of di?erent
levels. The kinds of impact range from the design of new algorithms addressing speci?c
critical application needs to the creation of a broader understanding that aims to improve
the transition process itself. We outline these in more detail below.
• At the lowest level, we propose speci?c research projects aimed at developing and ap-
plying sophisticated algorithms for a number of important application domains. These
include algorithms for mesh generation with applications to scienti?c simulations and
graphics, algorithms for indexing and searching needed for a number of data analysis
tasks, and algorithms that connect theoretical results in machine learning and cryp-
tography with the goal of producing a fundamentally new way for people to securely
1
authenticate to their computers. These projects will require the joint expertise of
multiple PIs along with researchers in corresponding application areas.
We expect the bene?ts of this work to occur at both ends of the pipeline: improved,
next generation applications based on sophisticated algorithms, and new theory as the
applications push back on the standard theoretical models and assumptions.
• At a somewhat higher level, we propose to create a center (with a small “c”) to which
researchers in application areas can come and draw on the knowledge and expertise of
the PIs to best integrate algorithmic techniques and principles into their own projects.
In order to make this truly successful, a signi?cant fraction of the funds requested in
this proposal are for partial support of graduate students who would become pro?cient
in both algorithm theory and the target application areas, and whose research would be
in explicitly pushing the connections through. These graduate students will be crucial
both in forming the “glue” that makes these connections succeed, and in themselves
ful?lling the highest-level goals of this project by becoming the next generation of
vertically-integrated algorithm researchers.
• At the highest level, in part through the lessons learned and in part through spe-
ci?c educational components, this proposal aims to create tools, infrastructure, and
knowledge bases that will improve the process of moving algorithms from theory to
applications. As one example, the course “Algorithms in the Real World” run by PI
Blelloch has already developed a set of web pages detailing how a number of types
of algorithms are used in various applications, as well as lessons learned, pitfalls, and
what turn out to be the crucial issues in the end. A new, extensible version of this
knowledge-base would provide support for theoreticians who want to push their new
techniques into applications, as well as for researchers in application areas who want
to learn about techniques that have proven useful in other domains. In addition, this
will provide support for educators on both sides who wish to add material to their
courses about the uses of algorithms in applications. We hope the end result to be
both a faster pipeline from algorithm design to application, and improved sharing of
algorithmic techniques across application areas.
We plan to develop other types of tools and infrastructure as well, based on lessons
learned through the more speci?c research projects, for example, a library of code for
manipulating triangulated meshes.
Having several researchers working together with common goals across all levels is a
crucial part of the proposed e?ort. Although funding individuals works well at the ?rst
level, the second and third levels depend critically on group collaboration. In particular,
working as a group will supply the following bene?ts.
• Many applications are composed of several parts that require di?erent kinds of al-
gorithms and theoretical techniques. For example, a web caching application might
require (1) expertise in cryptography to address the security issues (2) expertise in fast
data structures to implement the lookups quickly, (3) expertise on data compression to
store the cached ?les e?ciently, (4) expertise on parallel algorithms since web caches
2
run on parallel machines, and (5) expertise on approximation algorithms and heuris-
tics since under reasonable assumptions deciding where to place caches is NP-hard.
Although any one of the PIs could help on some subset of these topics, the applica-
tion would be best served by the PIs’ combined e?ort. Data analysis applications in
computational science have a similar ?avor.
• Many algorithms and theoretical techniques are used across a variety of applications.
For example singular value decomposition is used in computer vision, web searching,
motion planning, computational biology, and databases. Developing fast approxima-
tion algorithms for SVD, for example, could be useful across all these areas, but the
particular needs of each area are slightly di?erent (e.g. density of the matrices, reli-
ability of the input, and so on). As a group, we expect to be able to better identify
those key similarities and di?erences and better take advantage of them.
• The highest level infrastructure goals of developing tools and knowledge bases for
broadly improving the transition of algorithms from theory to application are best
served as a group. For example we can share the e?ort of designing and maintaining
web pages, of organizing workshops, and of developing a framework for libraries of
information useful to the community as a whole.
The interaction among the PIs and with the application areas will consist of weekly
meetings, joint students, jointly taught courses, sharing of code, and joint directed e?orts
with researchers in the areas.
2
Background
Each of the PIs has a strong history and interest in algorithms, both theoretical and applied.
Guy Blelloch is known for development of the NESL parallel programming language, as
well as fast parallel algorithms for sorting, Delaunay triangulation, list-ranking, and graph
connectivity. Avrim Blum is known for his research in algorithms for machine learning, data
mining, and combinatorial optimization, and is also developer of the Graphplan planning
algorithm, now used as the basis of many AI planning systems. Manuel Blum received the
ACM Turing Award for his work in the foundation of computational complexity theory and
its applications to cryptography and program checking. John La?erty is known for his work
in language modeling and information retrieval, and is co-developer (along with PI Sleator)
of the Link Grammar natural-language parser, now incorporated as one of the SPEC-2000
integer benchmarks. Daniel Sleator is winner of this year’s ACM Kanellakis “Theory and
Practice” award for the development of the Splay Tree data structure, used in applications
from network routers to the Lotus word processor. He has more recently been developing
algorithms for music analysis and natural language applications.
The proposed coordinated e?ort is motivated in part by a course we have developed ti-
tled “Algorithms in the Real World” (http://www.cs.cmu.edu/˜guyb/realworld.html), and
a workshop we are organizing on the same topic (http://www.cs.cmu.edu/˜realworld/). The
course covers algorithms for compression, cryptography, optimization, triangulation, VLSI,
computational biology, and indexing. In developing the material for the course we noticed
3
that theoretically interesting algorithms are being used in practice much more than most
people realize. We also noticed, however, that in recent years research on algorithms has
often taken on a life of its own within a variety of application areas. Although this re?ects
well on the importance of non-trivial algorithms in practice, it has meant that many tech-
niques are reinvented many times over, and that techniques are often developed in a more
speci?c framework than they should be, making them di?cult to generalize. We feel that
the algorithms/theory community can play a crucial role in improving the process by acting
as a clearinghouse of ideas, as well as continuing to develop basic theory, which will be useful
in the next generation of applications.
The remainder of the proposal is structured as follows. We begin with a sampling of
several speci?c research projects we intend to undertake. We then give more detail on
our higher-level objectives and plans for their realization. We discuss plans for integrating
research and education, and then end with a description of the PIs’ results from prior NSF
support.
3
Speci?c Research Directions
Speci?c application domains we plan to address, based on the background of the PIs and
the potential for collaboration with other research groups within Carnegie Mellon University
include scienti?c computation, data mining, robotics, graphics, security, computational biol-
ogy, and information retrieval. Three speci?c topics are described in more detail here. Two
of these (algorithms for indexing and searching, and algorithms for triangulated meshing)
already have a signi?cant track record; one (the human-oriented ID project) is completely
new to this proposal.
3.1
Algorithms for Indexing and Searching
E?cient indexing and searching of the quickly growing archives of on-line information will
require increasingly sophisticated algorithms. Information retrieval and management in-
volves almost every aspect of algorithms, including data structures, random walks, graph
algorithms, compression, cryptography, load-balancing, and many others. A survey of some
of the important and technically interesting research challenges in this area was presented
at a recent tutorial at FOCS by Broder and Henzinger [37].
Certainly many of the algorithmic aspects involved in classical information retrieval for
Boolean queries and the vector space model of Salton [59] are by now well understood and
engineered in practical systems [63]. However, even in this well studied area, recent research
on algorithms may be able to yield improvements. For example, binary operations on ordered
sets (union, intersection, etc.) using random balanced binary trees and parallel algorithms
can result in signi?cant speed-ups over more standard techniques [20]. As another example,
recent research on document ranking has made use of the singular-value decomposition for
matrices [7, 45, 8], and these methods have much in common with graph separator algorithms
[41], Monte-Carlo algorithms for low-rank approximations [39] and other topics of active
interest in the algorithms community.
4
But information retrieval and management is rapidly evolving in ways that will raise
many new algorithmic issues. For example, new probabilistic approaches to information re-
trieval are now emerging [57, 6, 53] and the algorithmic aspects of scaling these approaches
to very large data collections are virtually unexplored. Moreover, the way in which a user
is able to search for information is evolving. Current research e?orts are going beyond sim-
ple Boolean queries to more sophisticated question answering systems and context-sensitive
queries. E?ciently handling such searching paradigms will require new ways of indexing and
retrieving data. One of the goals of this research project will be to open communication
between researchers in the information retrieval and algorithms communities, so that good
solutions to these emerging problems can be found quickly, without “reinventing the wheel”
on many of the more basic algorithmic issues.
3.2
Human Oriented ID Project
Almost everyone who uses a computer is familiar with the notion of authentication: the pro-
cess by which a user convinces a computer that the user is who he says he is. Authentication
is usually done by presenting a password. Unfortunately, passwords, including ?ngerprints
and retinal scans, su?er the danger that they can be recorded and later replayed by an
eavesdropper.
One way to make authentication secure against eavesdropping is to use a challenge-
response system. In challenge-response, a computer presents one of a large set of possible
challenges to a user, who then provides a correct response. If the system is properly designed,
an eavesdropper who observes a large number of challenge-response pairs will not be able
to impersonate the user on a new challenge. Secure challenge-response protocols have been
around for over a decade, based on public-key cryptography or zero-knowledge proofs.
A problem with these challenge-response systems is that they require the user to per-
form computations that themselves need a separate computing device or smart-card. The
proposed Human Oriented ID Project (HumanOID) headed by PIs M. Blum and A. Blum,
is to develop a challenge-response protocol that a user can perform easily in his head, with-
out need for any added computing device. In addition to the advantage over passwords
in terms of security against eavesdropping, such a system would also have the potential to
actually reduce cognitive load on users, because challenge-response systems can remove the
need to remember multiple unrelated passwords for security purposes. (For a description of
the growing problem caused by this cognitive load, see [1]).
The approach we are taking to the design of such a system is based on a connection be-
tween concepts in the area of Machine Learning and Cryptography. The connection between
machine learning and authentication is this: In a challenge-response system, an eavesdropper
is trying to ?gure out how to create a valid response to a new challenge by observing many
challenge-response pairs. If the challenges are strings or images from some ?xed distribution,
and responses are produced by some function that the user and computer both share, then
the eavesdropper is essentially acting as a learning algorithm. To prevent the eavesdropper
from learning, a good challenge-response authentication protocol should therefore be based
on a “simple but hard” learning problem. Formal connections between these areas have
been made already in [24]. This project intends to turn some of these high-level theoretical
5
connections into a real system usable by real people.
Producing a successful system involves a number of di?erent types of challenges. At
the theoretical level, the notions of “hardness” in machine learning and cryptography do
not perfectly mesh, and the most e?cient way to convert problems from one domain to
the other is not yet clear. At the usability level, there are a variety of human interface
challenges involved in making the system be easily usable without compromising security. We
anticipate, therefore, that making this project succeed will require substantial interaction,
not only between the PIs, but also with researchers with expertise in human interfaces,
and in many of the human aspects of security. Funding of this proposal will support this
interaction not only through the support of the PIs directly, but also through the partial
support of graduate students who will have a foot in several camps and be able to make
these connections work.
3.3
Algorithms for Triangulated Meshing
Triangulated meshes are becoming increasingly important in a wide variety of application ar-
eas, including graphics, geographic information systems, computer vision, data compression,
and engineering and scienti?c simulations. Triangulated meshes lead to many interesting
theoretical questions, and make use of a large number of basic algorithms as subcompo-
nents. For example maximum-?ow and maximal-independent sets on graphs are both used
in mesh coarsening algorithms [54, 2]. Although the basic techniques such as Delaunay tri-
angulation are shared among all the application areas, much of the code and many of the
more advanced techniques are developed independently.
As part of the proposed work there are several areas we plan to explore. First, we have
recently been working on a purely “functional” representation of simplicial complexes. As
motivated in the work of Kaplan and Tarjan [42], purely functional data structures have
the advantage that they are fully persistent. Furthermore they can be accessed or updated
in parallel without worry of con?icts. This has important applications in graphics and
other areas where it can be important to display a mesh while it is being updated, or to
keep a history of previous meshes so that they can be animated over time. In previous
work we have shown that one can match the ?(n log n) lower bound for 3D convex hull
using a purely functional triangulation (2D simplicial complex), and we have developed a
library of routines for manipulating purely functional simplicial complexes [11]. As part
of the proposed work we plan to explore the use of persistent simplicial complexes and
other persistent geometric data structures as part of various applications, including a 3D
version of Ruppert’s algorithm [58], a ?nite element package, and a multiresolution graphics
code. Some other areas we plan to explore include studying how meshes can be used for
data-compression, whether there are e?cient parallel variants of Ruppert’s algorithm, and
whether purely functional triangulations can be used to simplify developing kinetic data
structures [5].
The PIs plan to collaborate with three researchers/projects at Carnegie Mellon to see
if common tools can be shared among the groups. The researchers are Omar Ghattas of
the Quake project, who is making use of triangulated meshes to simulate ground motion
in earthquakes, Paul Heckbert of the graphics group, who is using triangulated meshes for
6
surface approximation and multi-resolution rendering, and Takeo Kanade of the vision group,
who is using triangulated meshes for 3d surface reconstruction from stereo vision. Blelloch
has already been working with Omar Ghattas on making use of a parallel Delaunay code [14]
for dynamically meshing highly deforming structures [9].
One of the outcomes we expect of the proposed work is a library of routines for manip-
ulating triangulated surfaces and simplicial complexes. This will di?er from the interface
supplied by CGAL [56], or LEDA [52] in that it would support persistent implementations
and would include application-level code.
4
Infrastructure and Higher-level Objectives
Making real, substantive connections between di?erent research areas is rarely an easy pro-
cess. Even areas that share much in common at the high level have di?erent perspectives
and language that make dialogue more di?cult than one would expect. For this reason,
two higher-level components of this proposal are (1) to create infrastructure locally that will
make these connections easier, and (2) to create tools and knowledge-bases that will ease
the process of moving algorithms into applications for the community as a whole.
4.1
Local infrastructure
At the local level we plan several initiatives to foster collaboration between the PIs and
application areas. These include (1) joint graduate students who are funded between areas,
(2) weekly project lunches with researchers in application areas, (3) courses covering cross-
disciplinary topics, and (4) undergraduate projects.
The largest component of the funds requested in this proposal are for partial (50%)
support of graduate students who would become pro?cient in both algorithm analysis and
in an application area. These students would work with the PIs and with researchers in
their other area, and form the “glue” that makes these connections succeed. Our experience
is that graduate students are crucial to most of the interdisciplinary collaborations we have
been involved in. However it is also di?cult for students to develop the independence to
engage in cross disciplinary research. By working as a group project with a large enough
collection of faculty and a critical mass of students we expect to be able to attract high-
quality students. We expect the students who work on the project to range from pure
theoreticians to experimentalists.
We plan to have weekly project lunches where researchers in areas with algorithmic needs
would come and present challenges, get advice, and begin to build a working collaboration.
The graduate students would also use these meetings to present ideas and insights related to
the use of algorithms in their application area. As an example of the type of collaboration
that might occur, we could have someone from Electrical Engineering come in and describe
their work on recon?gurable computing (i.e. chips in which the circuits can be changed
dynamically). After some discussion, we discover various areas of potential collaboration.
For example, the researcher is using various heuristics for optimizing the layout of the vir-
tual circuits on the physical circuits, and we are able to reduce their problem into known
graph problems for which recently developed approximation algorithms can be used; or,
7
we notice that some new algorithms for dynamically maintaining geometric con?gurations
could be used to update their data structures more rapidly; or, on the ?ip side, we realize
that recon?gurable circuits will make some application of zero-knowledge proofs much more
practical.
We expect the meetings to also expose similarities among problems in diverse areas. As
one example, PI M. Blum recently had the experience at City University of Hong Kong of
connecting a linguist with a computational biologist who had nearly identical algorithmic
problems. The linguist (Bill Wang) studies the evolution of languages. Armed with a tape
recorder, he interviews groups of linguistically equivalent people and constructs a representa-
tion for their language. He has software to infer the most likely evolutionary tree for modern
languages. The computational biologist (LuSheng Wang) builds algorithms for using DNA
to hypothesize evolutionary trees for biological species. Because both scientists worked in
di?erent departments (of the same University!) on di?erent subjects, they were not aware
that they were working on essentially the same problem. Note that we are not claiming this
is the ?rst time anyone has recognized the connections between these two areas; the point
is that the researchers in the application areas had not realized it, because they were not
operating at the algorithmic level of abstraction.
We plan to continue to o?er courses that cover algorithms from theory to applications. In
addition to the “Algorithms in the Real World” course, such courses would include more topic
speci?c courses, such as machine-learning, algorithms for computer networks, algorithms in
scienti?c computing, and security protocols. The idea would be to teach these courses jointly
with one of the PIs and a faculty member from the applied side of the topic. Our experience
is that the faculty learn as much from such an experience as the students, and that it makes
future collaboration signi?cantly easier.
In their senior year, our undergraduates have the option of doing a senior project. We
plan to organize a collection of projects each year which will involve some application of a
new and theoretically interesting algorithm. We also plan to initiate a a summer institute
for college undergraduates of computer science as described brie?y in Section 5.
4.2
Global infrastructure
At the global level we plan to develop tools and knowledge-bases that will ease the process
of moving algorithms into applications for the community as a whole. This will include
course material, a database for cross-referencing where algorithms are used, workshops, and
dissemination of our experiences. We discuss a cross-referencing database here.
We propose developing an online database in which the user can make queries about
particular algorithms, data-structures, or theoretical techniques and ?nd the applications
that make use of them. Similarly the user could query particular applications and ?nd what
algorithms are used in the application. As an example of its use, consider balanced-tree data
structures, and more speci?cally, Splay Trees. Let’s say that someone were interested in the
answer to the following questions:
1. Where can I ?nd an implementation of Splay Trees?
2. In what applications are Splay Trees used?
8
3. Are Splay Trees used more often in practice than other balanced trees or skip-lists?
4. What reasons do users cite for using Splay Trees, or the other balanced structures?
Although existing repositories, such as Skiena’s Stony Brook Algorithms Repository [60],
can help answer the ?rst question, it is very di?cult to ?nd answers to the others. Having
researched the topic, we can answer most of these questions: (2) Splay Trees are used in
the virtual memory system of NT, in the GCC compiler, in the Linux run time linker, in
the SQUID web proxy cache, in Fore systems and AT&T network routers, and in the most
widely deployed version of malloc, as some examples; (3) Splay Trees appear to be much
more commonly used than other balanced tree structures or skip-lists, although some key
packages such as LEDA [52] use skip lists; (4) Application users typically cite the locality
feature of Splay Trees, the simplicity of the code, the speed, and that they are easily meldable
as reasons for using them. Collecting this information, however, was a large amount of work
often requiring personal conversations with implementors of the systems. There is currently
no natural place to put this knowledge.
A database that is aimed at answering the latter three questions for other data structures
and algorithms could be a great resource. In particular it could help researchers in the
following ways:
• Determine the types of input that are used by a particular algorithm in various con-
texts. This would be particularly useful for algorithms whose performance is data
dependent.
• Determine what criteria are important to various applications. For example it might be
important to deal with noisy inputs, or for certain applications it might be important
that the algorithm can be run in limited space (e.g. on a hand-held).
• Make it easier to contact users of the algorithm to start collaborations with them.
• To use as examples in courses on algorithms to help motivate why an algorithm is
important, or to tie in the algorithm with other material the students are learning in
other classes, e.g. compilers.
5
Integration of Research and Education
We plan to integrate the proposed work with educational activities in several ways. First, we
plan to continue the “Algorithms in the Real World” course, and to develop a textbook and
other materials based on it. The proposed research will help guide the course, and similarly,
course projects are good opportunities to have students implement new algorithms and try
out new ideas that will contribute to the research. Second, we plan to run workshops on
applied algorithms; we are running one such workshop this May. These workshops will bring
together researchers in academia and industry who are developing and using algorithms in
their work. Third, we plan to initiate a summer institute for college undergraduates of
computer science. It would introduce junior-senior level college undergraduates from around
the world to great ideas in computer science, and to research in algorithms in particular.
9
We also believe that our project has a number of features that make it ideally suited to
help increase the participation of underrepresented groups in computer science: The area
itself, “algorithms in the real world,” has broad appeal and draws on a broad range of
interests and expertise. The ?eld is new and relatively accessible. Our university is strongly
committed to increasing diversity. In computer science we have already implemented several
successful programs. For a number of years, our colleagues (under the direction of Allan
Fisher) have run summer programs for high school teachers of advanced placement computer
science which, in addition to providing technical expertise, have stressed ways of increasing
the participation of young women. As a consequence, this year over 37% of the ?rst year
class in the Computer Science Department is female. Carnegie Mellon President Cohon has
provided a large grant for projects (under the direction of Lenore Blum) aimed at ensuring
the ongoing success of our female students. Building on these experiences, we are looking
at ways of increasing the participation of other underrepresented groups. One way will be
to actively recruit such participation in our upcoming conference “Algorithms in the Real
World” and in the proposed summer program in this area for undergraduates.
We anticipate that the web site to be developed in conjunction with our project will serve
as a resource for many undergraduate programs across the country. By providing exciting
new “real world” material that can be readily incorporated into various courses we hope to
be able to impact programs and students beyond our campus, including those at minority
institutions.
6
Research Results and Impact
The results and expected impact of the proposed research will include the following.
• We will develop new algorithms and techniques, with an emphasis on techniques that
span several application areas. We also plan to implement many of the algorithms we
develop and make them available as libraries.
• We will explore new theoretical problems by considering common ideas that appear in
di?erent application areas.
• We will generate a better understanding of where and how algorithms are used in
practice. This will include an on-line database of algorithms and their applications.
• We will generate on-line course material. This material will be available via the web
to undergraduate and research institutions across the country.
• We expect that our research will lead to better sharing of algorithms and techniques
among di?erent application areas.
7
The Carnegie Mellon Environment
The School of Computer Science at Carnegie Mellon is an ideal environment for the proposed
research because of the long emphasis of interdiciplinary programs and the extensive ongoing
10
Add New Comment