Saturday, February 9, 2008

Dynamic Programming Algorithms

Dynamic Programming Algorithms

Dynamic programming is a stage-wise search method suitable for optimization problems whose solutions may be viewed as the result of a sequence of decisions. The most attractive property of this strategy is that during the search for a solution it avoids full enumeration by pruning early partial decision solutions that cannot possibly lead to optimal solution. In many practical situations, this strategy hits the optimal solution in a polynomial number of decision steps. However, in the worst case, such a strategy may end up performing full enumeration.

Dynamic programming takes advantage of the duplication and arrange to solve each subproblem only once, saving the solution (in table or something) for later use. The underlying idea of dynamic programming is: avoid calculating the same stuff twice, usually by keeping a table of known results of subproblems. Unlike divide-and-conquer, which solves the subproblems top-down, a dynamic programming is a bottom-up technique.

Bottom-up means
Start with the smallest subproblems.
Combining theirs solutions obtain the solutions to subproblems of increasing size.
Until arrive at the solution of the original problem.


The Principle of Optimality
The dynamic programming relies on a principle of optimality. This principle states that in an optimal sequence of decisions or choices, each subsequence must also be optimal. For example, in matrix chain multiplication problem, not only the value we are interested in is optimal but all the other entries in the table are also represent optimal.
The principle can be related as follows: the optimal solution to a problem is a combination of optimal solutions to some of its subproblems.
The difficulty in turning the principle of optimally into an algorithm is that it is not usually obvious which subproblems are relevant to the problem under consideration.

Dynamic-Programming Solution
to the 0-1 Knapsack Problem

Problem Statement A thief robbing a store and can carry a maximal weight of W into their knapsack. There are n items and ith item weigh wi and is worth vi dollars. What items should thief take?

There are two versions of problem
Fractional knapsack problem The setup is same, but the thief can take fractions of items, meaning that the items can be broken into smaller pieces so that thief may decide to carry only a fraction of xi of item i, where 0 ≤ xi ≤ 1.
0-1 knapsack problem The setup is the same, but the items may not be broken into smaller pieces, so thief may decide either to take an item or to leave it (binary choice), but may not take a fraction of an item.

Fractional knapsack problem
Exhibit greedy choice property. Þ Greedy algorithm exists.
Exhibit optimal substructure property.
Þ
0-1 knapsack problem
Exhibit No greedy choice property. Þ No greedy algorithm exists.
Exhibit optimal substructure property.
Only dynamic programming algorithm exists.



Dynamic-Programming Solution to the 0-1 Knapsack Problem

Let i be the highest-numbered item in an optimal solution S for W pounds. Then S` = S - {i} is an optimal solution for W - wi pounds and the value to the solution S is Vi plus the value of the subproblem.
We can express this fact in the following formula: define c[i, w] to be the solution for items 1,2, . . . , i and maximum weight w. Then


0
if i = 0 or w = 0
c[i,w] =
c[i-1, w]
if wi ≥ 0

max [vi + c[i-1, w-wi], c[i-1, w]}
if i>0 and w ≥ wi


This says that the value of the solution to i items either include ith item, in which case it is vi plus a subproblem solution for (i - 1) items and the weight excluding wi, or does not include ith item, in which case it is a subproblem's solution for (i - 1) items and the same weight. That is, if the thief picks item i, thief takes vi value, and thief can choose from items w - wi, and get c[i - 1, w - wi] additional value. On other hand, if thief decides not to take item i, thief can choose from item 1,2, . . . , i- 1 upto the weight limit w, and get c[i - 1, w] value. The better of these two choices should be made.
Although the 0-1 knapsack problem, the above formula for c is similar to LCS formula: boundary values are 0, and other values are computed from the input and "earlier" values of c. So the 0-1 knapsack algorithm is like the LCS-length algorithm given in CLR for finding a longest common subsequence of two sequences.
The algorithm takes as input the maximum weight W, the number of items n, and the two sequences v = and w = . It stores the c[i, j] values in the table, that is, a two dimensional array, c[0 . . n, 0 . . w] whose entries are computed in a row-major order. That is, the first row of c is filled in from left to right, then the second row, and so on. At the end of the computation, c[n, w] contains the maximum value that can be picked into the knapsack.


Dynamic-0-1-knapsack (v, w, n, W)
FOR w = 0 TO W DO c[0, w] = 0FOR i=1 to n DO c[i, 0] = 0 FOR w=1 TO W DO IFf wi ≤ w THEN IF vi + c[i-1, w-wi] THEN c[i, w] = vi + c[i-1, w-wi] ELSE c[i, w] = c[i-1, w] ELSE c[i, w] = c[i-1, w]

The set of items to take can be deduced from the table, starting at c[n. w] and tracing backwards where the optimal values came from. If c[i, w] = c[i-1, w] item i is not part of the solution, and we are continue tracing with c[i-1, w]. Otherwise item i is part of the solution, and we continue tracing with c[i-1, w-W].

Analysis
This dynamic-0-1-kanpsack algorithm takes θ(nw) times, broken up as follows: θ(nw) times to fill the c-table, which has (n +1).(w +1) entries, each requiring θ(1) time to compute. O(n) time to trace the solution, because the tracing process starts in row n of the table and moves up 1 row at each step.

Dynamic-Programming Algorithm
for the Activity-Selection Problem

An activity-selection is the problem of scheduling a resource among several competing activity.
Problem Statement Given a set S of n activities with and start time, Si and fi, finish time of an ith activity. Find the maximum size set of mutually compatible activities.

Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and [sj, fj) do not overlap, that is, i and j are compatible if si ≥ fj and sj ≥ fi

Dynamic-Programming Algorithm
The finishing time are in a sorted array f[i] and the starting times are in array s[i]. The array m[i] will store the value mi, where mi is the size of the largest of mutually compatible activities among activities {1, 2, . . . , i}. Let BINARY-SEARCH(f, s) returns the index of a number i in the sorted array f such that f(i) ≤ s ≤ f[i + 1].

for i =1 to n do m[i] = max(m[i-1], 1+ m [BINARY-SEARCH(f, s[i])]) We have P(i] = 1 if activity i is in optimal selection, and P[i] = 0 otherwise i = n while i > 0 do if m[i] = m[i-1] then P[i] = 0 i = i - 1 else i = BINARY-SEARCH (f, s[i]) P[i] = 1


Analysis
The running time of this algorithm is O(n lg n) because of the binary search which takes lg(n) time as opposed to the O(n) running time of the greedy algorithm. This greedy algorithm assumes that the activities already sorted by increasing time.

Divide-and-Conquer Algorithm

Divide-and-Conquer Algorithm

Divide-and-conquer is a top-down technique for designing algorithms that consists of dividing the problem into smaller subproblems hoping that the solutions of the subproblems are easier to find and then composing the partial solutions into the solution of the original problem.

Little more formally, divide-and-conquer paradigm consists of following major phases:
Breaking the problem into several sub-problems that are similar to the original problem but smaller in size,
Solve the sub-problem recursively (successively and independently), and then
Combine these solutions to subproblems to create a solution to the original problem.

Binary Search (simplest application of divide-and-conquer)
Binary Search is an extremely well-known instance of divide-and-conquer paradigm. Given an ordered array of n elements, the basic idea of binary search is that for a given element we "probe" the middle element of the array. We continue in either the lower or upper segment of the array, depending on the outcome of the probe until we reached the required (given) element.
Problem Let A[1 . . . n] be an array of non-decreasing sorted order; that is A [i] ≤ A [j] whenever 1 ≤ i ≤ j ≤ n. Let 'q' be the query point. The problem consist of finding 'q' in the array A. If q is not in A, then find the position where 'q' might be inserted.
Formally, find the index i such that 1 ≤ i ≤ n+1 and A[i-1] < x ≤ A[i].

Sequential Search
Look sequentially at each element of A until either we reach at the end of an array A or find an item no smaller than 'q'.
Sequential search for 'q' in array A
for i = 1 to n do if A [i] ≥ q then return index i return n + 1

Analysis
This algorithm clearly takes a θ(r), where r is the index returned. This is Ω(n) in the worst case and O(1) in the best case.
If the elements of an array A are distinct and query point q is indeed in the array then loop executed (n + 1) / 2 average number of times. On average (as well as the worst case), sequential search takes θ(n) time.


Binary Search
Look for 'q' either in the first half or in the second half of the array A. Compare 'q' to an element in the middle, n/2 , of the array. Let k = n/2 . If q ≤ A[k], then search in the A[1 . . . k]; otherwise search T[k+1 . . n] for 'q'. Binary search for q in subarray A[i . . j] with the promise that
A[i-1] < x ≤ A[j]If i = j then return i (index)k= (i + j)/2if q ≤ A [k] then return Binary Search [A [i-k], q] else return Binary Search [A[k+1 . . j], q]

Analysis
Binary Search can be accomplished in logarithmic time in the worst case , i.e., T(n) = θ(log n). This version of the binary search takes logarithmic time in the best case.


Iterative Version of Binary Search
Interactive binary search for q, in array A[1 . . n]
if q > A [n] then return n + 1i = 1;j = n;while i < j do k = (i + j)/2 if q ≤ A [k] then j = k else i = k + 1return i (the index)

Analysis
The analysis of Iterative algorithm is identical to that of its recursive counterpart.

Greedy Introduction

Greedy Introduction
Greedy algorithms are simple and straightforward. They are shortsighted in their approach in the sense that they take decisions on the basis of information at hand without worrying about the effect these decisions may have in the future. They are easy to invent, easy to implement and most of the time quite efficient. Many problems cannot be solved correctly by greedy approach. Greedy algorithms are used to solve optimization problems
Greedy Approach
Greedy Algorithm works by making the decision that seems most promising at any moment; it never reconsiders this decision, whatever situation may arise later.
As an example consider the problem of "Making Change".
Coins available are:
dollars (100 cents)
quarters (25 cents)
dimes (10 cents)
nickels (5 cents)
pennies (1 cent)

Problem Make a change of a given amount using the smallest possible number of coins.

Informal Algorithm
Start with nothing.
at every stage without passing the given amount.
add the largest to the coins already chosen.

Formal Algorithm
Make change for n units using the least possible number of coins.
MAKE-CHANGE (n) C ← {100, 25, 10, 5, 1} // constant. Sol ← {}; // set that will hold the solution set. Sum ← 0 sum of item in solution set WHILE sum not = n x = largest item in set C such that sum + x ≤ n IF no such item THEN RETURN "No Solution" S ← S {value of x} sum ← sum + x RETURN S

Example Make a change for 2.89 (289 cents) here n = 2.89 and the solution contains 2 dollars, 3 quarters, 1 dime and 4 pennies. The algorithm is greedy because at every stage it chooses the largest coin without worrying about the consequences. Moreover, it never changes its mind in the sense that once a coin has been included in the solution set, it remains there.

Characteristics and Features of Problems solved by Greedy Algorithms
To construct the solution in an optimal way. Algorithm maintains two sets. One contains chosen items and the other contains rejected items.
The greedy algorithm consists of four (4) function.
A function that checks whether chosen set of items provide a solution.
A function that checks the feasibility of a set.
The selection function tells which of the candidates is the most promising.
An objective function, which does not appear explicitly, gives the value of a solution.

Structure Greedy Algorithm
Initially the set of chosen items is empty i.e., solution set.
At each step
item will be added in a solution set by using selection function.
IF the set would no longer be feasible
reject items under consideration (and is never consider again).
ELSE IF set is still feasible THEN
add the current item.

Definitions of feasibility
A feasible set (of candidates) is promising if it can be extended to produce not merely a solution, but an optimal solution to the problem. In particular, the empty set is always promising why? (because an optimal solution always exists)
Unlike Dynamic Programming, which solves the subproblems bottom-up, a greedy strategy usually progresses in a top-down fashion, making one greedy choice after another, reducing each problem to a smaller one.
Greedy-Choice Property
The "greedy-choice property" and "optimal substructure" are two ingredients in the problem that lend to a greedy strategy.
Greedy-Choice Property
It says that a globally optimal solution can be arrived at by making a locally optimal choice.

Knapsack Problem

Statement A thief robbing a store and can carry a maximal weight of w into their knapsack. There are n items and ith item weigh wi and is worth vi dollars. What items should thief take?
There are two versions of problem
Fractional knapsack problemThe setup is same, but the thief can take fractions of items, meaning that the items can be broken into smaller pieces so that thief may decide to carry only a fraction of xi of item i, where 0 ≤ xi ≤ 1.
Exhibit greedy choice property.
Greedy algorithm exists.
Exhibit optimal substructure property.
?????
0-1 knapsack problemThe setup is the same, but the items may not be broken into smaller pieces, so thief may decide either to take an item or to leave it (binary choice), but may not take a fraction of an item.
Exhibit No greedy choice property.
No greedy algorithm exists.
Exhibit optimal substructure property.
Only dynamic programming algorithm exists.

Dynamic-Programming Solution
to the 0-1 Knapsack Problem

Let i be the highest-numbered item in an optimal solution S for W pounds. Then S`= S - {i} is an optimal solution for W-wi pounds and the value to the solution S is Vi plus the value of the subproblem.

We can express this fact in the following formula: define c[i, w] to be the solution for items 1,2, . . . , i and maximum weight w. Then


0
if i = 0 or w = 0
c[i,w] =
c[i-1, w]
if wi ≥ 0

max [vi + c[i-1, w-wi], c[i-1, w]}
if i>0 and w ≥ wi

This says that the value of the solution to i items either include ith item, in which case it is vi plus a subproblem solution for (i-1) items and the weight excluding wi, or does not include ith item, in which case it is a subproblem's solution for (i-1) items and the same weight. That is, if the thief picks item i, thief takes vi value, and thief can choose from items w-wi, and get c[i-1, w-wi] additional value. On other hand, if thief decides not to take item i, thief can choose from item 1,2, . . . , i-1 upto the weight limit w, and get c[i-1, w] value. The better of these two choices should be made.
Although the 0-1 knapsack problem, the above formula for c is similar to LCS formula: boundary values are 0, and other values are computed from the input and "earlier" values of c. So the 0-1 knapsack algorithm is like the LCS-length algorithm given in CLR-book for finding a longest common subsequence of two sequences.
The algorithm takes as input the maximum weight W, the number of items n, and the two sequences v = and w = . It stores the c[i, j] values in the table, that is, a two dimensional array, c[0 . . n, 0 . . w] whose entries are computed in a row-major order. That is, the first row of c is filled in from left to right, then the second row, and so on. At the end of the computation, c[n, w] contains the maximum value that can be picked into the knapsack.

Dynamic-0-1-knapsack (v, w, n, W)
for w = 0 to W do c[0, w] = 0for i = 1 to n do c[i, 0] = 0 for w = 1 to W do if wi ≤ w then if vi + c[i-1, w-wi] then c[i, w] = vi + c[i-1, w-wi] else c[i, w] = c[i-1, w] else c[i, w] = c[i-1, w]

The set of items to take can be deduced from the table, starting at c[n. w] and tracing backwards where the optimal values came from. If c[i, w] = c[i-1, w] item i is not part of the solution, and we are continue tracing with c[i-1, w]. Otherwise item i is part of the solution, and we continue tracing with c[i-1, w-W].

Analysis
This dynamic-0-1-kanpsack algorithm takes θ(nw) times, broken up as follows:
θ(nw) times to fill the c-table, which has (n+1).(w+1) entries, each requiring θ(1) time to compute. O(n) time to trace the solution, because the tracing process starts in row n of the table and moves up 1 row at each step.

Greedy Solution to the
Fractional Knapsack Problem

There are n items in a store. For i =1,2, . . . , n, item i has weight wi > 0 and worth vi > 0. Thief can carry a maximum weight of W pounds in a knapsack. In this version of a problem the items can be broken into smaller piece, so the thief may decide to carry only a fraction xi of object i, where 0 ≤ xi ≤ 1. Item i contributes xiwi to the total weight in the knapsack, and xivi to the value of the load.
In Symbol, the fraction knapsack problem can be stated as follows.maximize nSi=1 xivi subject to constraint nSi=1 xiwi ≤ W
It is clear that an optimal solution must fill the knapsack exactly, for otherwise we could add a fraction of one of the remaining objects and increase the value of the load. Thus in an optimal solution nSi=1 xiwi = W.

Greedy-fractional-knapsack (w, v, W)
FOR i =1 to n do x[i] =0weight = 0while weight < W do i = best remaining item IF weight + w[i] ≤ W then x[i] = 1 weight = weight + w[i] else x[i] = (w - weight) / w[i] weight = Wreturn x


Analysis
If the items are already sorted into decreasing order of vi / wi, then the while-loop takes a time in O(n); Therefore, the total time including the sort is in O(n log n).
If we keep the items in heap with largest vi/wi at the root. Then
creating the heap takes O(n) time
while-loop now takes O(log n) time (since heap property must be restored after the removal of root)
Although this data structure does not alter the worst-case, it may be faster if only a small number of items are need to fill the knapsack.

One variant of the 0-1 knapsack problem is when order of items are sorted by increasing weight is the same as their order when sorted by decreasing value.
The optimal solution to this problem is to sort by the value of the item in decreasing order. Then pick up the most valuable item which also has a least weight. First, if its weight is less than the total weight that can be carried. Then deduct the total weight that can be carried by the weight of the item just pick. The second item to pick is the most valuable item among those remaining. Keep follow the same strategy until thief cannot carry more item (due to weight).

Proof
One way to proof the correctness of the above algorithm is to prove the greedy choice property and optimal substructure property. It consist of two steps. First, prove that there exists an optimal solution begins with the greedy choice given above. The second part prove that if A is an optimal solution to the original problem S, then A - a is also an optimal solution to the problem S - s where a is the item thief picked as in the greedy choice and S - s is the subproblem after the first greedy choice has been made. The second part is easy to prove since the more valuable items have less weight.Note that if v` / w` , is not it can replace any other because w` <> v. □

Theorem The fractional knapsack problem has the greedy-choice property.
Proof Let the ratio v`/w` is maximal. This supposition implies that v`/w` ≥ v/w for any pair (v, w), so v`v / w > v for any (v, w). Now Suppose a solution does not contain the full w` weight of the best ratio. Then by replacing an amount of any other w with more w` will improve the value. □

An Activity Selection Problem

An activity-selection is the problem of scheduling a resource among several competing activity.
Problem Statement
Given a set S of n activities with and start time, Si and fi, finish time of an ith activity. Find the maximum size set of mutually compatible activities.

Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and [sj, fj) do not overlap, that is, i and j are compatible if si ≥ fj and sj ≥ fi

Greedy Algorithm for Selection Problem
I. Sort the input activities by increasing finishing time. f1 ≤ f2 ≤ . . . ≤ fn
II. Call GREEDY-ACTIVITY-SELECTOR (s, f)
n = length [s]
A={i}
j = 1
for i = 2 to n
do if si ≥ fj
then A= AU{i}
j = i
return set A


Operation of the algorithm
Let 11 activities are given S = {p, q, r, s, t, u, v, w, x, y, z} start and finished times for proposed activities are (1, 4), (3, 5), (0, 6), 5, 7), (3, 8), 5, 9), (6, 10), (8, 11), (8, 12), (2, 13) and (12, 14).
A = {p} Initialization at line 2A = {p, s} line 6 - 1st iteration of FOR - loopA = {p, s, w} line 6 -2nd iteration of FOR - loopA = {p, s, w, z} line 6 - 3rd iteration of FOR-loopOut of the FOR-loop and Return A = {p, s, w, z}


Analysis
Part I requires O(n lg n) time (use merge of heap sort).Part II requires θ(n) time assuming that activities were already sorted in part I by their finish time.


Correctness
Note that Greedy algorithm do not always produce optimal solutions but GREEDY-ACTIVITY-SELECTOR does.

Theorem Algorithm GREED-ACTIVITY-SELECTOR produces solution of maximum size for the activity-selection problem.

Proof Idea Show the activity problem satisfied
Greedy choice property.
Optimal substructure property.

Proof
Let S = {1, 2, . . . , n} be the set of activities. Since activities are in order by finish time. It implies that activity 1 has the earliest finish time. Suppose, AS is an optimal solution and let activities in A are ordered by finish time. Suppose, the first activity in A is k.If k = 1, then A begins with greedy choice and we are done (or to be very precise, there is nothing to proof here).If k 1, we want to show that there is another solution B that begins with greedy choice, activity 1.Let B = A - {k} {1}. Because f1 fk, the activities in B are disjoint and since B has same number of activities as A, i.e., A = B, B is also optimal.
Once the greedy choice is made, the problem reduces to finding an optimal solution for the problem. If A is an optimal solution to the original problem S, then A` = A - {1} is an optimal solution to the activity-selection problem S` = {i S: Si fi}. why? Because if we could find a solution B` to S` with more activities then A`, adding 1 to B` would yield a solution B to S with more activities than A, there by contradicting the optimality. □



As an example consider the example. Given a set of activities to among lecture halls. Schedule all the activities using minimal lecture halls.In order to determine which activity should use which lecture hall, the algorithm uses the GREEDY-ACTIVITY-SELECTOR to calculate the activities in the first lecture hall. If there are some activities yet to be scheduled, a new lecture hall is selected and GREEDY-ACTIVITY-SELECTOR is called again. This continues until all activities have been scheduled.

LECTURE-HALL-ASSIGNMENT (s, f)
n = length [s)for i = 1 to n do HALL [i] = NILk = 1while (Not empty (s)) do HALL [k] = GREEDY-ACTIVITY-SELECTOR (s, t, n) k = k + 1return HALL

Following changes can be made in the GREEDY-ACTIVITY-SELECTOR (s, f) (see CLR).
j = first (s)A = ifor i = j + 1 to n do if s(i) not= "-" then if
GREED-ACTIVITY-SELECTOR (s, f, n)
j = first (s)A = i = j + 1 to n if s(i] not = "-" then if s[i] ≥ f[j] then A = AU{i} s[i] = "-" j = ireturn A


Correctness
The algorithm can be shown to be correct and optimal. As a contradiction, assume the number of lecture halls are not optimal, that is, the algorithm allocates more hall than necessary. Therefore, there exists a set of activities B which have been wrongly allocated. An activity b belonging to B which has been allocated to hall H[i] should have optimally been allocated to H[k]. This implies that the activities for lecture hall H[k] have not been allocated optimally, as the GREED-ACTIVITY-SELECTOR produces the optimal set of activities for a particular lecture hall.

Analysis
In the worst case, the number of lecture halls require is n. GREED-ACTIVITY-SELECTOR runs in θ(n). The running time of this algorithm is O(n2).
Two important Observations
Choosing the activity of least duration will not always produce an optimal solution. For example, we have a set of activities {(3, 5), (6, 8), (1, 4), (4, 7), (7, 10)}. Here, either (3, 5) or (6, 8) will be picked first, which will be picked first, which will prevent the optimal solution of {(1, 4), (4, 7), (7, 10)} from being found.
Choosing the activity with the least overlap will not always produce solution. For example, we have a set of activities {(0, 4), (4, 6), (6, 10), (0, 1), (1, 5), (5, 9), (9, 10), (0, 3), (0, 2), (7, 10), (8, 10)}. Here the one with the least overlap with other activities is (4, 6), so it will be picked first. But that would prevent the optimal solution of {(0, 1), (1, 5), (5, 9), (9, 10)} from being found.

Dynamic-Programming Algorithm for the Activity-Selection Problem
Huffman Codes

Huffman code is a technique for compressing data. Huffman's greedy algorithm look at the occurrence of each character and it as a binary string in an optimal way.

Example
Suppose we have a data consists of 100,000 characters that we want to compress. The characters in the data occur with following frequencies.


a
b
c
d
e
f
Frequency
45,000
13,000
12,000
16,000
9,000
5,000

Consider the problem of designing a "binary character code" in which each character is represented by a unique binary string.

Fixed Length Code
In fixed length code, needs 3 bits to represent six(6) characters.


a
b
c
d
e
f
Frequency
45,000
13,000
12,000
16,000
9,000
5,000
Fixed Length code
000
001
010
011
100
101

This method require 3000,000 bits to code the entire file.
How do we get 3000,000?
Total number of characters are 45,000 + 13,000 + 12,000 + 16,000 + 9,000 + 5,000 = 1000,000.
Add each character is assigned 3-bit codeword => 3 * 1000,000 = 3000,000 bits.

Conclusion
Fixed-length code requires 300,000 bits while variable code requires 224,000 bits.
=> Saving of approximately 25%.


Prefix Codes
In which no codeword is a prefix of other codeword. The reason prefix codes are desirable is that they simply encoding (compression) and decoding.

Can we do better?
A variable-length code can do better by giving frequent characters short codewords and infrequent characters long codewords.


a
b
c
d
e
f
Frequency
45,000
13,000
12,000
16,000
9,000
5,000
Fixed Length code
0
101
100
111
1101
1100

Character 'a' are 45,000 each character 'a' assigned 1 bit codeword. 1 * 45,000 = 45,000 bits.

Characters (b, c, d) are 13,000 + 12,000 + 16,000 = 41,000 each character assigned 3 bit codeword 3 * 41,000 = 123,000 bits

Characters (e, f) are 9,000 + 5,000 = 14,000 each character assigned 4 bit codeword. 4 * 14,000 = 56,000 bits.

Implies that the total bits are: 45,000 + 123,000 + 56,000 = 224,000 bits


Encoding: Concatenate the codewords representing each characters of the file.
String Encoding TEA 10 00 010 SEA 011 00 010 TEN 10 00 110

Example From variable-length codes table, we code the3-character file abc as:

a
b
c

0
101
100
=> 0.101.100 = 0101100



Decoding
Since no codeword is a prefix of other, the codeword that begins an encoded file is unambiguous.To decode (Translate back to the original character), remove it from the encode file and repeatedly parse.For example in "variable-length codeword" table, the string 001011101 parse uniquely as 0.0.101.1101, which is decode to aabe.
The representation of "decoding process" is binary tree, whose leaves are characters. We interpret the binary codeword for a character as path from the root to that character, where 0 means "go to the left child" and 1 means "go to the right child". Note that an optimal code for a file is always represented by a full (complete) binary tree.



Theorem A Binary tree that is not full cannot correspond to an optimal prefix code.

Proof Let T be a binary tree corresponds to prefix code such that T is not full. Then there must exist an internal node, say x, such that x has only one child, y. Construct another binary tree, T`, which has save leaves as T and have same depth as T except for the leaves which are in the subtree rooted at y in T. These leaves will have depth in T`, which implies T cannot correspond to an optimal prefix code.To obtain T`, simply merge x and y into a single node, z is a child of parent of x (if a parent exists) and z is a parent to any children of y. Then T` has the desired properties: it corresponds to a code on the same alphabet as the code which are obtained, in the subtree rooted at y in T have depth in T` strictly less (by one) than their depth in T.This completes the proof. □



a
b
c
d
e
f
Frequency
45,000
13,000
12,000
16,000
9,000
5,000
Fixed Length code
000
001
010
011
100
101
Variable-length Code
0
101
100
111
1101
1100

Fixed-length code is not optimal since binary tree is not full.
Figure

Optimal prefix code because tree is full binary
Figure



From now on consider only full binary tree
If C is the alphabet from which characters are drawn, then the tree for an optimal prefix code has exactly c leaves (one for each letter) and exactly c-1 internal orders. Given a tree T corresponding to the prefix code, compute the number of bits required to encode a file. For each character c in C, let f(c) be the frequency of c and let dT(c) denote the depth of c's leaf. Note that dT(c) is also the length of codeword. The number of bits to encode a file is
B (T) = S f(c) dT(c)
which define as the cost of the tree T.
For example, the cost of the above tree is
B (T) = S f(c) dT(c) = 45*1 +13*3 + 12*3 + 16*3 + 9*4 +5*4 = 224
Therefore, the cost of the tree corresponding to the optimal prefix code is 224 (224*1000 = 224000).

Constructing a Huffman code
A greedy algorithm that constructs an optimal prefix code called a Huffman code. The algorithm builds the tree T corresponding to the optimal code in a bottom-up manner. It begins with a set of c leaves and perform c-1 "merging" operations to create the final tree.
Data Structure used: Priority queue = QHuffman (c)n = cQ = cfor i =1 to n-1 do z = Allocate-Node () x = left[z] = EXTRACT_MIN(Q) y = right[z] = EXTRACT_MIN(Q) f[z] = f[x] + f[y] INSERT (Q, z)return EXTRACT_MIN(Q)


Analysis
Q implemented as a binary heap.
line 2 can be performed by using BUILD-HEAP (P. 145; CLR) in O(n) time.
FOR loop executed n - 1 times and since each heap operation requires O(lg n) time. => the FOR loop contributes (n - 1) O(lg n) => O(n lg n)
Thus the total running time of Huffman on the set of n characters is O(nlg n).


Operation of the Algorithm

An optimal Huffman code for the following set of frequencies a:1 b:1 c:2 d:3 e:5 g:13 h:2 Note that the frequencies are based on Fibonacci numbers.
Since there are letters in the alphabet, the initial queue size is n = 8, and 7 merge steps are required to build the tree. The final tree represents the optimal prefix code.
Figure
The codeword for a letter is the sequence of the edge labels on the path from the root to the letter. Thus, the optimal Huffman code is as follows:

h :
1






g :
1
0





f :
1
1
0




e :
1
1
1
0



d :
1
1
1
1
0


c :
1
1
1
1
1
0

b :
1
1
1
1
1
1
0
a :
1
1
1
1
1
1
1

As we can see the tree is one long limb with leaves n=hanging off. This is true for Fibonacci weights in general, because the Fibonacci the recurrence is
Fi+1 + Fi + Fi-1 implies that i Fi = Fi+2 - 1.
To prove this, write Fj as Fj+1 - Fj-1 and sum from 0 to i, that is, F-1 = 0.

Correctness of Huffman Code Algorithm
Proof Idea
Step 1: Show that this problem satisfies the greedy choice property, that is, if a greedy choice is made by Huffman's algorithm, an optimal solution remains possible.
Step 2: Show that this problem has an optimal substructure property, that is, an optimal solution to Huffman's algorithm contains optimal solution to subproblems.
Step 3: Conclude correctness of Huffman's algorithm using step 1 and step 2.


Lemma - Greedy Choice Property Let c be an alphabet in which each character c has frequency f[c]. Let x and y be two characters in C having the lowest frequencies. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit.

Proof Idea
Take the tree T representing optimal prefix code and transform T into a tree T` representing another optimal prefix code such that the x characters x and y appear as sibling leaves of maximum depth in T`. If we can do this, then their codewords will have same length and differ only in the last bit.
Figures

Proof
Let characters b and c are sibling leaves of maximum depth in tree T. Without loss of generality assume that f[b] ≥ f[c] and f[x] ≤ f[y]. Since f[x] and f[y] are lowest leaf frequencies in order and f[b] and f[c] are arbitrary frequencies in order. We have f[x] ≤ f[b] and f[y] ≤ f[c]. As shown in the above figure, exchange the positions of leaves to get first T` and then T``. By formula, B(t) = c in C f(c)dT(c), the difference in cost between T and T` is
B(T) - B(T`) = f[x]dT(x) + f(b)dT(b) - [f[x]dT(x) + f[b]dT`(b) = (f[b] - f[x]) (dT(b) - dT(x)) = (non-negative)(non-negative) ≥ 0

Two Important Points
The reason f[b] - f[x] is non-negative because x is a minimum frequency leaf in tree T and the reason dT(b) - dT(x) is non-negative that b is a leaf of maximum depth in T.Similarly, exchanging y and c does not increase the cost which implies that B(T) - B(T`) ≥ 0. This fact in turn implies that B(T) ≤ B(T`), and since T is optimal by supposition, which implies B(T`) = B(T). Therefore, T`` is optimal in which x and y are sibling leaves of maximum depth, from which greedy choice property follows. This complete the proof. □


Lemma - Optimal Substructure Property Let T be a full binary tree representing an optimal prefix code over an alphabet C, where frequency f[c] is define for each character c belongs to set C. Consider any two characters x and y that appear as sibling leaves in the tree T and let z be their parent. Then, considering character z with frequency f[z] = f[x] + f[y], tree T` = T - {x, y} represents an optimal code for the alphabet C` = C - {x, y}U{z}.

Proof Idea
Figure

Proof
We show that the cost B(T) of tree T can be expressed in terms of the cost B(T`). By considering the component costs in equation, B(T) = f(c)dT(c), we show that the cost B(T) of tree T can be expressed in terms of the cost B(T`) of the tree T`. For each c belongs to C - {x, y}, we have dT(c) = dT(c)
cinC f[c]dT(c) = ScinC-{x,y} f[c]dT`(c) = f[x](dT` (z) + 1 + f[y] (dT`(z) +1) = (f[x] + f[y]) dT`(z) + f[x] + f[y]And B(T) = B(T`) + f(x) + f(y)

If T` is non-optimal prefix code for C`, then there exists a T`` whose leaves are the characters belongs to C` such that B(T``) < B(T`). Now, if x and y are added to T`` as a children of z, then we get a prefix code for alphabet C with cost B(T``) + f[x] + f[y] < B(T), Contradicting the optimality of T. Which implies that, tree T` must be optimal for the alphabet C. □



Theorem Procedure HUFFMAN produces an optimal prefix code.

Proof
Let S be the set of integers n ≥ 2 for which the Huffman procedure produces a tree of representing optimal prefix code for frequency f and alphabet C with c = n.If C = {x, y}, then Huffman produces one of the following optimal trees.
figure
This clearly shows 2 is a member of S. Next assume that n belongs to S and show that (n+1) also belong to S.Let C is an alphabet with c = n + 1. By lemma 'greedy choice property', there exists an optimal code tree T for alphabet C. Without loss of generality, if x and y are characters with minimal frequencies then a. x and y are at maximal depth in tree T and b. and y have a common parent z.Suppose that T` = T - {x,y} and C` = C - {x,y}U{z} then by lemma-optimal substructure property (step 2), tree T` is an optimal code tree for C`. Since C` = n and n belongs to S, the Huffman code procedure produces an optimal code tree T* for C`. Now let T** be the tree obtained from T* by attaching x and y as leaves to z.Without loss of generality, T** is the tree constructed for C by the Huffman procedure. Now suppose Huffman selects a and b from alphabet C in first step so that f[a] not equal f[x] and f[b] = f[y]. Then tree C constructed by Huffman can be altered as in proof of lemma-greedy-choice-property to give equivalent tree with a and y siblings with maximum depth. Since T` and T* are both optimal for C`, implies that B(T`) = B(T*) and also B(T**) = B(T) why? because
B(T**) = B(T*) - f[z]dT*(z) + [f[x] + f[y]] (dT*(z) + 1)] = B(T*) + f[x] + f[y]
Since tree T is optimal for alphabet C, so is T** . And T** is the tree constructed by the Huffman code.And this completes the proof. □


Theorem The total cost of a tree for a code can be computed as the sum, over all internal nodes, of the combined frequencies of the two children of the node.

Proof
Let tree be a full binary tree with n leaves. Apply induction hypothesis on the number of leaves in T. When n=2 (the case n=1 is trivially true), there are two leaves x and y (say) with the same parent z, then the cost of T is
B(T) = f(x)dT(x) + f[y]dT(y) = f[x] + f[y] since dT(x) = dT(y) =1 = f[child1 of z] + f[child2 of z].
Thus, the statement of theorem is true. Now suppose n>2 and also suppose that theorem is true for trees on n-1 leaves.Let c1 and c2 are two sibling leaves in T such that they have the same parent p. Letting T` be the tree obtained by deleting c1 and c2, we know by induction that
B(T) = leaves l` in T` f[l`]dT(l`) = internal nodes i` in T` f[child1of i`] + f [child2 of i`]
Using this information, calculates the cost of T.
B(T) = leaves l in T f[l]dT(l) = l not= c1, c2 f[l]dT(l) + f[c1]dT (c1) -1) + f[c2]dT (c2) -1) + f[c1]+ f[c2] = leaves l` in T` f[l]dT`(l) + f[c1]+ f[c2] = internal nodes i` in T` f[child1of i`] + f [child2 of i`] + f[c1]+ f[c2] = internal nodes i in T f[child1of i] + f[child1of i]
Thus the statement is true. And this completes the proof.

The question is whether Huffman's algorithm can be generalize to handle ternary codewords, that is codewords using the symbols 0,1 and 2. Restate the question, whether or not some generalized version of Huffman's algorithm yields optimal ternary codes? Basically, the algorithm is similar to the binary code example given in the CLR-text book. That is, pick up three nodes (not two) which have the least frequency and form a new node with frequency equal to the summation of these three frequencies. Then repeat the procedure. However, when the number of nodes is an even number, a full ternary tree is not possible. So take care of this by inserting a null node with zero frequency.

Correctness
Proof is immediate from the greedy choice property and an optimal substructure property. In other words, the proof is similar to the correctness proof of Huffman's algorithm in the CLR.

Spanning Tree and
Minimum Spanning Tree

Spanning Trees
A spanning tree of a graph is any tree that includes every vertex in the graph. Little more formally, a spanning tree of a graph G is a subgraph of G that is a tree and contains all the vertices of G. An edge of a spanning tree is called a branch; an edge in the graph that is not in the spanning tree is called a chord. We construct spanning tree whenever we want to find a simple, cheap and yet efficient way to connect a set of terminals (computers, cites, factories, etc.). Spanning trees are important because of following reasons.
Spanning trees construct a sparse sub graph that tells a lot about the original graph.
Spanning trees a very important in designing efficient routing algorithms.
Some hard problems (e.g., Steiner tree problem and traveling salesman problem) can be solved approximately by using spanning trees.
Spanning trees have wide applications in many areas, such as network design, etc.

Greedy Spanning Tree Algorithm
One of the most elegant spanning tree algorithm that I know of is as follows:
Examine the edges in graph in any arbitrary sequence.
Decide whether each edge will be included in the spanning tree.
Note that each time a step of the algorithm is performed, one edge is examined. If there is only a finite number of edges in the graph, the algorithm must halt after a finite number of steps. Thus, the time complexity of this algorithm is clearly O(n), where n is the number of edges in the graph.

Some important facts about spanning trees are as follows:
Any two vertices in a tree are connected by a unique path.
Let T be a spanning tree of a graph G, and let e be an edge of G not in T. The T+e contains a unique cycle.

Lemma The number of spanning trees in the complete graph Kn is nn-2.


Greediness It is easy to see that this algorithm has the property that each edge is examined at most once. Algorithms, like this one, which examine each entity at most once and decide its fate once and for all during that examination are called greedy algorithms. The obvious advantage of greedy approach is that we do not have to spend time reexamining entities.

Consider the problem of finding a spanning tree with the smallest possible weight or the largest possible weight, respectively called a minimum spanning tree and a maximum spanning tree. It is easy to see that if a graph possesses a spanning tree, it must have a minimum spanning tree and also a maximum spanning tree. These spanning trees can be constructed by performing the spanning tree algorithm (e.g., above mentioned algorithm) with an appropriate ordering of the edges.
Minimum Spanning Tree Algorithm Perform the spanning tree algorithm (above) by examining the edges is order of non decreasing weight (smallest first, largest last). If two or more edges have the same weight, order them arbitrarily.
Maximum Spanning Tree Algorithm Perform the spanning tree algorithm (above) by examining the edges in order of non increasing weight (largest first, smallest last). If two or more edges have the same weight, order them arbitrarily.

Minimum Spanning Trees
A minimum spanning tree (MST) of a weighted graph G is a spanning tree of G whose edges sum is minimum weight. In other words, a MST is a tree formed from a subset of the edges in a given undirected graph, with two properties:
it spans the graph, i.e., it includes every vertex of the graph.
it is a minimum, i.e., the total weight of all the edges is as low as possible.
Let G=(V, E) be a connected, undirected graph where V is a set of vertices (nodes) and E is the set of edges. Each edge has a given non negative length.
Problem Find a subset T of the edges of G such that all the vertices remain connected when only the edges T are used, and the sum of the lengths of the edges in T is as small as possible.
Let G` = (V, T) be the partial graph formed by the vertices of G and the edges in T. [Note: A connected graph with n vertices must have at least n-1 edges AND more that n-1 edges implies at least one cycle]. So n-1 is the minimum number of edges in the T. Hence if G` is connected and T has more that n-1 edges, we can remove at least one of these edges without disconnecting (choose an edge that is part of cycle). This will decrease the total length of edges in T.
G` = (V, T) where T is a subset of E. Since connected graph of n nodes must have n-1 edges otherwise there exist at least one cycle. Hence if G` is connected and T has more that n-1 edges. Implies that it contains at least one cycle. Remove edge from T without disconnecting the G` (i.e., remove the edge that is part of the cycle). This will decrease the total length of the edges in T. Therefore, the new solution is preferable to the old one.
Thus, T with n vertices and more edges can be an optimal solution. It follow T must have n-1 edges and since G` is connected it must be a tree. The G` is called Minimum Spanning Tree (MST).
Prim's Algorithm
This algorithm was first propsed by Jarnik, but typically attributed to Prim. it starts from an arbitrary vertex (root) and at each stage, add a new branch (edge) to the tree already constructed; the algorithm halts when all the vertices in the graph have been reached. This strategy is greedy in the sense that at each step the partial spanning tree is augmented with an edge that is the smallest among all possible adjacent edges.
MST-PRIM
Input: A weighted, undirected graph G=(V, E, w)Output: A minimum spanning tree T.
T={}Let r be an arbitrarily chosen vertex from V.U = {r}WHILE U < n DOFind u in U and v in V-U such that the edge (u, v) is a smallest edge between U-V.T = TU{(u, v)}U= UU{v}
AnalysisThe algorithm spends most of its time in finding the smallest edge. So, time of the algorithm basically depends on how do we search this edge.Straightforward methodJust find the smallest edge by searching the adjacency list of the vertices in V. In this case, each iteration costs O(m) time, yielding a total running time of O(mn).
Binary heapBy using binary heaps, the algorithm runs in O(m log n).
Fibonacci heapBy using Fibonacci heaps, the algorithm runs in O(m + n log n) time.
Dijkstra's Algorithm (Shortest Path)

Consider a directed graph G = (V, E).
Problem Determine the length of the shortest path from the source to each of the other nodes of the graph. This problem can be solved by a greedy algorithm often called Dijkstra's algorithm.
The algorithm maintains two sets of vertices, S and C. At every stage the set S contains those vertices that have already been selected and set C contains all the other vertices. Hence we have the invariant property V=S U C. When algorithm starts Delta contains only the source vertex and when the algorithm halts, Delta contains all the vertices of the graph and problem is solved. At each step algorithm choose the vertex in C whose distance to the source is least and add it to S.

The Classic Multiplication Algorithm

An algorithm is a set of rules for carrying out calculation either by hand or on a machine.
An algorithm is a finite step-by-step procedure to achieve a required result.
An algorithm is a sequence of computational steps that transform the input into the output.
An algorithm is a sequence of operations performed on data that have to be organized in data structures.
An algorithm is an abstraction of a program to be executed on a physical machine (model of Computation).
The most famous algorithm in history dates well before the time of the ancient Greeks: this is Euclid's algorithm for calculating the greatest common divisor of two integers.
The Classic Multiplication Algorithm
1. Multiplication, the American way:
Multiply the multiplicand one after another by each digit of the multiplier taken from right to left.
2. Multiplication, the English way:
Multiply the multiplicand one after another by each digit of the multiplier taken from left to right.

Algorithmic is a branch of computer science that consists of designing and analyzing computer algorithms
The “design” pertain to
The description of algorithm at an abstract level by means of a pseudo language, and
Proof of correctness that is, the algorithm solves the given problem in all cases.
The “analysis” deals with performance evaluation (complexity analysis).
We start with defining the model of computation, which is usually the Random Access Machine (RAM) model, but other models of computations can be use such as PRAM. Once the model of computation has been defined, an algorithm can be describe using a simple language (or pseudo language) whose syntax is close to programming language such as C or java.

Algorithm's Performance
Two important ways to characterize the effectiveness of an algorithm are its space complexity and time complexity. Time complexity of an algorithm concerns determining an expression of the number of steps needed as a function of the problem size. Since the step count measure is somewhat coarse, one does not aim at obtaining an exact step count. Instead, one attempts only to get asymptotic bounds on the step count. Asymptotic analysis makes use of the O (Big Oh) notation. Two other notational constructs used by computer scientists in the analysis of algorithms are Θ (Big Theta) notation and Ω (Big Omega) notation.The performance evaluation of an algorithm is obtained by totaling the number of occurrences of each operation when running the algorithm. The performance of an algorithm is evaluated as a function of the input size n and is to be considered modulo a multiplicative constant.
The following notations are commonly use notations in performance analysis and used to characterize the complexity of an algorithm.
Θ-Notation (Same order)
This notation bounds a function to within constant factors. We say f(n) = Θ(g(n)) if there exist positive constants n0, c1 and c2 such that to the right of n0 the value of f(n) always lies between c1g(n) and c2g(n) inclusive.



O-Notation (Upper Bound)
This notation gives an upper bound for a function to within a constant factor. We write f(n) = O(g(n)) if there are positive constants n0 and c such that to the right of n0, the value of f(n) always lies on or below cg(n).




Ω-Notation (Lower Bound)
This notation gives a lower bound for a function to within a constant factor. We write f(n) = Ω(g(n)) if there are positive constants n0 and c such that to the right of n0, the value of f(n) always lies on or above cg(n).


Algorithm Analysis
The complexity of an algorithm is a function g(n) that gives the upper bound of the number of operation (or running time) performed by an algorithm when the input size is n.
There are two interpretations of upper bound.
Worst-case Complexity
The running time for any given size input will be lower than the upper bound except possibly for some values of the input where the maximum is reached.
Average-case Complexity
The running time for any given size input will be the average number of operations over all problem instances for a given size.
Because, it is quite difficult to estimate the statistical behavior of the input, most of the time we content ourselves to a worst case behavior. Most of the time, the complexity of g(n) is approximated by its family o(f(n)) where f(n) is one of the following functions. n (linear complexity), log n (logarithmic complexity), na where a≥2 (polynomial complexity), an (exponential complexity).

Optimality
Once the complexity of an algorithm has been estimated, the question arises whether this algorithm is optimal. An algorithm for a given problem is optimal if its complexity reaches the lower bound over all the algorithms solving this problem. For example, any algorithm solving “the intersection of n segments” problem will execute at least n2 operations in the worst case even if it does nothing but print the output. This is abbreviated by saying that the problem has Ω(n2) complexity. If one finds an O(n2) algorithm that solve this problem, it will be optimal and of complexity Θ(n2).

Reduction
Another technique for estimating the complexity of a problem is the transformation of problems, also called problem reduction. As an example, suppose we know a lower bound for a problem A, and that we would like to estimate a lower bound for a problem B. If we can transform A into B by a transformation step whose cost is less than that for solving A, then B has the same bound as A.
The Convex hull problem nicely illustrates "reduction" technique. A lower bound of Convex-hull problem established by reducing the sorting problem (complexity: Θ(nlogn)) to the Convex hull problem.




Sets
A set is a collection of different things (distinguishable objects or distinct objects) represented as a unit. The objects in a set are called its elements or members. If an object x is a member of a set S, we write x S. On the the hand, if x is not a member of S, we write z S. A set cannot contain the same object more than once, and its elements are not ordered.For example, consider the set S= {7, 21, 57}. Then 7 {7, 21, 57} and 8 {7, 21, 57} or equivalently, 7 S and 8 S.We can also describe a set containing elements according to some rule. We write
{n : rule about n}
Thus, {n : n = m2 for some m N } means that a set of perfect squares.Set CardinalityThe number of elements in a set is called cardinality or size of the set, denoted S or sometimes n(S). The two sets have same cardinality if their elements can be put into a one-to-one correspondence. It is easy to see that the cardinality of an empty set is zero i.e., . MustiestIf we do want to take the number of occurrences of members into account, we call the group a multiset.For example, {7} and {7, 7} are identical as set but {7} and {7, 7} are different as multiset.Infinite Set A set contains infinite elements. For example, set of negative integers, set of integers, etc.Empty Set Set contain no member, denoted as or {}. SubsetFor two sets A and B, we say that A is a subset of B, written A B, if every member of A also is a member of B. Formally, A B if
x A implies x B
written
x A => x B.
Proper SubsetSet A is a proper subset of B, written A B, if A is a subset of B and not equal to B. That is, A set A is proper subset of B if A B but A B.Equal SetsThe sets A and B are equal, written A = B, if each is a subset of the other. Rephrased definition, let A and B be sets. A = B if A B and B A.Power SetLet A be the set. The power of A, written P(A) or 2A, is the set of all subsets of A. That is, P(A) = {B : B A}.For example, consider A={0, 1}. The power set of A is {{}, {0}, {1}, {0, 1}}. And the power set of A is the set of all pairs (2-tuples) whose elements are 0 and 1 is {(0, 0), (0, 1), (1, 0), (1, 1)}.Disjoint SetsLet A and B be sets. A and B are disjoint if A B = .Union of SetsThe union of A and B, written A B, is the set we get by combining all elements in A and B into a single set. That is,
A B = { x : x A or x B}.
For two finite sets A and B, we have identity
A B = A + B - AB
We can conclude
A B A + B
That is,
if A B = 0 then A B = A + B and if A B then A B
Intersection SetsThe intersection of set set A and B, written AB, is the set of elements that are both in A and in B. That is,
AB = { x : x A and x B}.
Partition of SetA collection of S = {Si} of nonempty sets form a partition of a set if
i. The set are pair-wise disjoint, that is, Si, Sj and i j imply Si Sj = .
ii. Their union is S, that is, S = Si
In other words, S form a partition of S if each element of S appears in exactly on Si.Difference of SetsLet A and B be sets. The difference of A and B is
A - B = {x : x A and x B}.
For example, let A = {1, 2, 3} and B = {2, 4, 6, 8}. The set difference A - B = {1, 3} while B-A = {4, 6, 8}.Complement of a SetAll set under consideration are subset of some large set U called universal set. Given a universal set U, the complement of A, written A', is the set of all elements under consideration that are not in A.Formally, let A be a subset of universal set U. The complement of A in U is
A' = A - U
OR
A' = {x : x U and x A}.
For any set A U, we have following laws
i. A'' = Aii. A A' = .iii. A A' = U
Symmetric difference Let A and B be sets. The symmetric difference of A and B is
A B = { x : x A or x B but not both}

Therefore,

A B = (A B) - (AB)
As an example, consider the following two sets A = {1, 2, 3} and B = {2, 4, 6, 8}. The symmetric difference, A B = {1, 3, 4, 6, 8}.SequencesA sequence of objects is a list of objects in some order. For example, the sequence 7, 21, 57 would be written as (7, 21, 57). In a set the order does not matter but in a sequence it does.Hence, (7, 21, 57) {57, 7, 21} But (7, 21, 57) = {57, 7, 21}.Repetition is not permitted in a set but repetition is permitted in a sequence. So, (7, 7, 21, 57) is different from {7, 21, 57}.TuplesFinite sequence often are called tuples. For example,
(7, 21) 2-tuple or pair(7, 21, 57) 3-tuple(7, 21, ..., k ) k-tuple
An ordered pair of two elements a and b is denoted (a, b) and can be defined as (a, b) = (a, {a, b}).Cartesian Product or Cross ProductIf A and B are two sets, the cross product of A and B, written A×B, is the set of all pairs wherein the first element is a member of the set A and the second element is a member of the set B. Formally,
A×B = {(a, b) : a A, b B}.
For example, let A = {1, 2} and B = {x, y, z}. Then A×B = {(1, x), (1, y), (1, z), (2, x), (2, y), (2, z)}.When A and B are finite sets, the cardinality of their product is
A×B = A . B
n-tuplesThe cartesian product of n sets A1, A2, ..., An is the set of n-tuples
A1 × A2 × ... × An = {(a1, ..., an) : ai Ai, i = 1, 2, ..., n}
whose cardinality is
A1 × A2 × ... × An = A1 . A2 ... An
If all sets are finite. We denote an n-fold cartesian product over a single set A by the set
An = A × A × ... × A
whose cardinality is
An = An if A is finite.


Vectors and Matrices
A vector, u, means a list (or n-tuple) of numbers:
u = (u1, u2, . . . , un)
where ui are called the components of u. If all the ui are zero i.e., ui = 0, then u is called the zero vector.Given vectors u and v are equal i.e., u = v, if they have the same number of components and if corresponding components are equal.
Addition of Two Vectors
If two vectors, u and v, have the number of components, their sum, u + v, is the vector obtained by adding corresponding components from u and v.
u + v = (u1, u2, . . . , un) + (v1, v2, . . . , vn)
= (u1 + v1 + u2 + v2, . . . , un + vn)


Multiplication of a vector by a Scalar
The product of a scalar k and a vector u i.e., ku, is the vector obtained by multiplying each component of u by k:
ku = k(u1, u2, . . . , un)
= ku1, ku2, . . . , kun
Here, we define -u = (-1)u and u-v = u +(-v)It is not difficult to see k(u + v) = ku + kv where k is a scalar and u and v are vectorsDot Product and NormThe dot product or inner product of vectors u = (u1, u2, . . . , un) and v = (v1, v2, . . . , vn) is denoted by u.v and defined by
u.v = u1v1 + u2v2 + . . . + unvn
The norm or length of a vector, u, is denoted by u and defined by


Matrices
Matrix, A, means a rectangular array of numbers
A =
The m horizontal n-tuples are called the rows of A, and the n vertical m-tuples, its columns. Note that the element aij, called the ij-entry, appear in the ith row and the jth column.In algorithmic (study of algorithms), we like to write a matrix A as A(aij).
Column Vector
A matrix with only one column is called a column vector

Zero Matrix
A matrix whose entries are all zero is called a zero matrix and denoted by 0.

Matrix Addition
Let A and B be two matrices of the same size. The sum of A and B is written as A + B and obtained by adding corresponding elements from A and B. A+B = =

Scalar Multiplication
The product of a scalar k and a matrix A, written kA or Ak, is the matrix obtained by multiplying each element of A by k.
kA = k
=
Here, we define -A = (-1)A and A - B = A + (-B). Note that -A is the negative of the matrix A.Properties of Matrix under Addition and MultiplicationLet A, B, and C be matrices of same size and let k and l be scalars. Then
(A + B) + A + (B + C)
A + B = B + A
A + 0 = 0 + A = A
A + (-A) = (-A) + A = 0
k(A + B) = kA + kB
(k + l)A = kA + lA
(kl)A = k(lA)
lA = A

Matrix Multiplication
Suppose A and B are two matrices such that the number of columns of A is equal to number of rows of B. Say matrix A is an m×p matrix and matrix B is a p×n matrix. Then the product of A and B is the m×n matrix whose ij-entry is obtained by multiplying the elements of the ith row of a by the corresponding elements of the jth column of B and then adding them.It is important to note that if the number of columns of A is not equal to the number of rows of B, then the product AB is not defined.Properties of Matrix MultiplicationLet A, B, and C be matrices and let k be a scalar. Then
(AB)C = A(BC)
A(B+C) = AB + AC
(B+C)A = BA + CA
k(AB) = (kB)B = A(kB)

Transpose
The transpose of a matrix A is obtained by writing the row of A, in order, as columns and denoted by AT. In other words, if A - (Aij), then B = (bij) is the transpose of A if bij - aji for all i and j.It is not hard to see that if A is an m×n matrix, then AT is an n×m matrix.For example if A = , then AT =
Square Matrix
If the number of rows and the number of columns of any matrix are the same, we say matrix is a square matrix, i.e., a square matrix has same number of rows and columns. A square matrix with n rows and n columns is said to be order n and is called an n-square matrix.The main diagonal, or simply diagonal, of an n-square matrix A = (aij) consists of the elements a11, a22, a33, . . . , amn.

Unit Matrix
The n-square matrix with 1's along the main diagonal and 0's elsewhere is called the unit matrix and usually denoted by I.
Why unit matrix?
The unit matrix plays the same role in matrix multiplication as the number 1 does in the usual multiplication of numbers.

AI = IA = A

for any square matrix A.
So what dude?
We can form powers of a square matrix X by defining

X2 = XX
X3 = X2X, XXX, . . .
X0 = I
Big deal!
We can also form polynomial in X. That is, for any polynomial

f(x) = a0 + a1x + a2x2 + . . . + anxn

We define f(x) to be the matrix

f(x) = a0I + a1x + a2x2 + . . . + anxn

Note that in the case that f(A) is the zero matrix, then matrix A said to be a zero of the polynomial f(x) or a root of the polynomial equation f(x) = 0.
Now you're talking !!!!

Invertible Matrices
A square matrix A is said to be invertible if there exists a matrix B with the property AB = BA = I (Identity Matrix). Such a matrix B is unique and it is called the matrix of A and is denoted by A-1. Here, the important observation is that B is the inverse of A if and only if A is the matrix of B. It is known that AB = I if and only if BA = I; hence it is necessary to test only one product to determine whether two given matrices are inverse.
Determinants
To each n-square matrix A = (aij), we assign a specific number called the determinants of A and denoted as
A = del(A)

=
where an n by n arrays of numbers enclosed by straight lines is called a determinant of order n.It is important to note that n by n array of numbers enclosed by straight line (see above) is NOT a matrix but denotes the number that the determinant function assigns to the enclosed array of numbers, i.e., the enclosed square matrix.The determinant of order one is a11 = a11The determinant of order two is = a11a22 - a12a21The determinant of order three is = a11a22a32+ a12a23a31 + a13a21a32 - a13a22a31 - a12a21a33 - a11a23a32The determinant of order fo... You get the picture !

General Definition of Determinant
The general definition of a determinant of order n is as follows
det(A) =
where the sum is over all possible permutations = (j1, j2 , . . . , jn ) of (1, 2, . . . , n). Here sgn( ) equals plus or minus one according as an even or an odd number of interchanges are required to change so that its numbers are in the usual order.An important property of the determinant function is Lemma. For any two n-square matrices A and B, det(AB) = det(A) . det(B).In simple words the lemma say that the determinant function is multiplicative.An important point in the context of invertible matrices and determinant isLemma. A square matrix is invertible if and only if it has a nonzero determinant. A good news and a bad news: If you're an under graduate, you're done here! now goto CLR- If you're graduate and interested in theoretical Computer Science its time to go and find some good on matrix theory. (You'll need this shit for Linear Programming, for example).Linear Inequalities and Linear Equations

Inequalities
The term inequality is applied to any statement involving one of the symbols <, >, , =" type="#_x0000_t75">. Example of inequalities are:
i. x=" type="#_x0000_t75"> 1 ii. x + y + 2z > 16 iii. p2 + q2 1/2iv. a2 + ab > 1

Fundamental Properties of Inequalities
1. If a b and c is any real number, then a + c b + c.
For example, -3 -1 implies -3+4 -1 + 4.
2. If a b and c is positive, then ac bc.
For example, 2 3 implies 2(4) 3(4).
3. If a b and c is negative, then ac=" type="#_x0000_t75"> bc.
For example, 3 9 implies 3(-2)=" type="#_x0000_t75"> 9(-2).
4. If a b and b c, then a c.
For example, -1/2 2 and 2 8/3 imply -1/2 8/3.

Solution of Inequality
By solution of the one variable inequality 2x + 3 7 we mean any number which substituted for x yields a true statement.For example, 1 is a solution of 2x + 3 7 since 2(1) + 3 = 5 and 5 is less than and equal to 7.By a solution of the two variable inequality x - y 5 we mean any ordered pair of numbers which when substituted for x and y, respectively, yields a true statement.For example, (2, 1) is a solution of x - y 5 because 2-1 = 1 and 1 5.By a solution of the three variable inequality 2x - y + z=" type="#_x0000_t75"> 3 we means an ordered triple of number which when substituted for x, y and z respectively, yields a true statement. For example, (2, 0, 1) is a solution of 2x - y + z=" type="#_x0000_t75"> 3.A solution of an inequality is said to satisfy the inequality. For example, (2, 1) is satisfy x - y 5.Two or more inequalities, each with the same variables, considered as a unit, are said to form a system of inequalities. For example,
x=" type="#_x0000_t75"> 0 y=" type="#_x0000_t75"> 02x + y=" type="#_x0000_t75"> 4
Note that the notion of a system of inequalities is analogous to that of a solution of a system of equations.
Any solution common to all of the inequalities of a system of inequalities is said to be a solution of that system of inequalities. A system of inequalities, each of whose members is linear, is said to be a system of linear inequalities.

Geometric Interpretation of Inequalities
An inequality in two variable x and y describes a region in the x-y plane (called its graph), namely, the set of all points whose coordinates satisfy the inequality.
The y-axis divide, the xy-plane into two regions, called half-planes.
Right half-planeThe region of points whose coordinates satisfy inequality x > 0.
Left half-planeThe region of points whose coordinates satisfy inequality x < 0.
Similarly, the x-axis divides the xy-plane into two half-planes.
Upper half-planeIn which inequality y > 0 is true.
Lower half-planeIn which inequality y < 0 is true.
What is x-axis and y-axis? They are simply lines. So, the above arguments can be applied to any line.Every line ax + by = c divides the xy-plane into two regions called its half-planes.
On one half-plane ax + by > c is true.
On the other half-plane ax + by < c is true.
Linear Equations
One Unknown
A linear equation in one unknown can always be stated into the standard form
ax = b
where x is an unknown and a and b are constants. If a is not equal to zero, this equation has a unique solution
x = b/a

Two Unknowns
A linear equation in two unknown, x and y, can be put into the form
ax + by = c

where x and y are two unknowns and a, b, c are real numbers. Also, we assume that a and b are no zero.Solution of Linear EquationA solution of the equation consists of a pair of number, u = (k1, k2), which satisfies the equation ax + by = c. Mathematically speaking, a solution consists of u = (k1, k2) such that ak1 + bk2 = c. Solution of the equation can be found by assigning arbitrary values to x and solving for y OR assigning arbitrary values to y and solving for x.Geometrically, any solution u = (k1, k2) of the linear equation ax + by = c determine a point in the cartesian plane. Since a and b are not zero, the solution u correspond precisely to the points on a straight line.
Two Equations in the Two Unknowns
A system of two linear equations in the two unknowns x and y is
a1x + b1x = c1a2x + b2x = c2
Where a1, a2, b1, b2 are not zero. A pair of numbers which satisfies both equations is called a simultaneous solution of the given equations or a solution of the system of equations.Geometrically, there are three cases of a simultaneous solution
If the system has exactly one solution, the graph of the linear equations intersect in one point.
If the system has no solutions, the graphs of the linear equations are parallel.
If the system has an infinite number of solutions, the graphs of the linear equations coincide.
The special cases (2) and (3) can only occur when the coefficient of x and y in the two linear equations are proportional.

OR => a1b2 - a2b1 = 0 => = 0
The system has no solution when The solution to system
a1x + b1x = c1a2x + b2x = c2
can be obtained by the elimination process, whereby reduce the system to a single equation in only one unknown. This is accomplished by the following algorithm
ALGORITHM
Step 1 Multiply the two equation by two numbers which are such that the resulting coefficients of one of the unknown are negative of each other. Step 2 Add the equations obtained in Step 1. The output of this algorithm is a linear equation in one unknown. This equation may be solved for that unknown, and the solution may be substituted in one of the original equations yielding the value of the other unknown.As an example, consider the following system
3x + 2y = 8 ------------ (1)2x - 5y = -1 ------------ (2)
Step 1: Multiply equation (1) by 2 and equation (2) by -3
6x + 4y = 16-6x + 15y = 3
Step 2: Add equations, output of Step 1
19y = 19
Thus, we obtain an equation involving only unknown y. we solve for y to obtain
y = 1
Next, we substitute y =1 in equation (1) to get
x = 2
Therefore, x = 2 and y = 1 is the unique solution to the system.

n Equations in n Unknowns
Now, consider a system of n linear equations in n unknowns
a11x1 + a12x2 + . . . + a1nxn = b1a21x1 + a22x2 + . . . + a2nxn = b2 . . . . . . . . . . . . . . . . . . . . . . . . . an1x1 + an2x2 + . . . + annxn = bn
Where the aij, bi are real numbers. The number aij is called the coefficient of xj in the ith equation, and the number bi is called the constant of the ith equation. A list of values for the unknowns,
x1 = k1, x2 = k2, . . . , xn = kn
or equivalently, a list of n numbers
u = (k1, k2, . . . , kn)
is called a solution of the system if, with kj substituted for xj, the left hand side of each equation in fact equals the right hand side.The above system is equivalent to the matrix equation.
or, simply we can write A × = B, where A = (aij), × = (xi), and B = (bi).The matrix is called the coefficient matrix of the system of n linear equations in the system of n unknown.The matrix is called the augmented matrix of n linear equations in n unknown. Note for algorithmic nerds: we store a system in the computer as its augmented matrix. Specifically, system is stored in computer as an N × (N+1) matrix array A, the augmented matrix array A, the augmented matrix of the system. Therefore, the constants b1, b2, . . . , bn are respectively stored as A1,N+1, A2,N+1, . . . , AN,N+1.
Solution of a Triangular System
If aij = 0 for i > j, then system of n linear equations in n unknown assumes the triangular form. a11x1 + a12x2 + . . . + a1,n-1xn-1 + a1nxn = b1 a22x2 + . . . + a2,n-1xn-1 + a2nxn = b2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . an-2,n-2xn-2 + an-2,n-1xn-1 + an-2,nxn-1 + a2nxn = b2 an-1,n-1xn-1 + an-1,nxn = bn-1 amnxn = bnWhere A = a11a22 . . . ann; If none of the diagonal entries a11,a22, . . ., ann is zero, the system has a unique solution.

Back Substitution Method
we obtain the solution of a triangular system by the technique of back substitution, consider the above general triangular system.1. First, we solve the last equation for the last unknown, xn;
xn = bn/ann
2. Second, we substitute the value of xn in the next-to-last equation and solve it for the next-to-last unknown, xn-1:
.
3. Third, we substitute these values for xn and xn-1 in the third-from-last equation and solve it for the third-from-last unknown, xn-2 :
.
In general, we determine xk by substituting the previously obtained values of xn, xn-1, . . . , xk+1 in the kth equation.
.


Gaussian Elimination
Gaussian elimination is a method used for finding the solution of a system of linear equations. This method consider of two parts.
This part consists of step-by-step putting the system into triangular system.
This part consists of solving the triangular system by back substitution.
x - 3y - 2z = 6 --- (1) 2x - 4y + 2z = 18 --- (2) -3x + 8y + 9z = -9 --- (3)
First PartEliminate first unknown x from the equations 2 and 3.(a) multiply -2 to equation (1) and add it to equation (2). Equation (2) becomes
2y + 6z = 6
(b) Multiply 3 to equation (1) and add it to equation (3). Equation (3) becomes
-y + 3z = 9
And the original system is reduced to the system
x - 3y - 2z = 6 2y + 6z = 6 -y + 3z = 9
Now, we have to remove the second unknown, y, from new equation 3, using only the new equation 2 and 3 (above).a, Multiply equation (2) by 1/2 and add it to equation (3). The equation (3) becomes 6z = 12.Therefore, our given system of three linear equation of 3 unknown is reduced to the triangular system
x - 3y - 2z = 6 2y + 6z = 6 6z = 12

Second Part
In the second part, we solve the equation by back substitution and get
x = 1, y = -3, z = 2
In the first stage of the algorithm, the coefficient of x in the first equation is called the pivot, and in the second stage of the algorithm, the coefficient of y in the second equation is the point. Clearly, the algorithm cannot work if either pivot is zero. In such a case one must interchange equation so that a pivot is not zero. In fact, if one would like to code this algorithm, then the greatest accuracy is attained when the pivot is as large in absolute value as possible. For example, we would like to interchange equation 1 and equation 2 in the original system in the above example before eliminating x from the second and third equation.That is, first step of the algorithm transfer system as
2x - 4y + 2z = 18 x - 4y + 2z = 18 -3x + 8y + 9z = -9
Determinants and systems of linear equationsConsider a system of n linear equations in n unknowns. That is, for the following system
a11x1 + a12x2 + . . . + a1nxn = b1 a21x1 + a22x2 + . . . + a2nxn = b2 . . . . . . . . . . . . . . . . . . . . . . . . . an1x1 + an2x2 + . . . + annxn = bn
Let D denote the determinant of the matrix A +(aij) of coefficients; that is, let D =A. Also, let Ni denote the determinants of the matrix obtained by replacing the ith column of A by the column of constants.Theorem. If D 0, the above system of linear equations has the unique solution .This theorem is widely known as Cramer's rule. It is important to note that Gaussian elimination is usually much more efficient for solving systems of linear equations than is the use of determinants.