I stumbled upon a mildly interesting problem yesterday: Given an Array a and a
permutation p, apply the permutation (in place) to the Array, using only O(1)
extra space. So, if b is the array after the algorithm, we want that
`a[i] == b[p[i]]`

.

Naively, we would solve our problem by doing something like this (I'm using go here):

```
func Naive(vals, perm []int) {
n := len(vals)
res := make([]int, n)
for i := range vals {
res[perm[i]] = vals[i]
}
copy(vals, res)
}
```

This solves the problem in O(n) time, but it uses of course O(n) extra space for the result array. Note also, that it does not really work in place, we have to copy the result back.

The simplest iteration of this, would be to simply use a sorting-algorithm of our choice, but use as a sorting key not the value of the elements, but the position of the corresponding field in the permutation array:

```
import "sort"
type PermSorter struct {
vals []int
perm []int
}
func (p PermSorter) Len() int {
return len(p.vals)
}
func (p PermSorter) Less(i, j int) bool {
return p.perm[i] < p.perm[j]
}
func (p PermSorter) Swap(i, j int) {
p.vals[i], p.vals[j] = p.vals[j], p.vals[i]
p.perm[i], p.perm[j] = p.perm[j], p.perm[i]
}
func Sort(vals, perm []int) {
sort.Sort(PermSorter{vals, perm})
}
```

This appears a promising idea at first, but as it turns out, this doesn't
*really* use constant space after all (at least not generally). The go sort
package uses introsort internally, which is a combination of quick- and
heapsort, the latter being chosen if the recursion-depth of quicksort exceeds a
limit in O(log(n)). Thus it uses actually O(log(n)) auxiliary space. Also, the
running time of sorting is O(n log(n)) and while time complexity wasn't part of
the initially posed problem, it would actually nice to have linear running
time, if possible.

Note also another point: The above implementation sorts perm, thus destroying the permutation array. Also not part of the original problem, this might pose problems if we want to apply the same permutation to multiple arrays. We can rectify that in this case by doing the following:

```
type NDPermSorter struct {
vals []int
perm []int
}
func (p NDPermSorter) Len() int {
return len(p.vals)
}
func (p NDPermSorter) Less(i, j int) bool {
return p.perm[p.vals[i]] < p.perm[p.vals[j]]
}
func (p NDPermSorter) Swap(i, j int) {
p.vals[i], p.vals[j] = p.vals[j], p.vals[i]
}
func NDSort(vals, perm []int) {
sort.Sort(NDPermSorter{vals, perm})
}
```

But note, that this only works, because we want to sort an array of consecutive integers. In general, we don't want to do that. And I am unaware of a solution that doesn't have this problem (though I also didn't think about it a lot).

The solution of solving this problem in linear time lies in a simple
observation: If we start at any index and iteratively jump to the *target*
index of the current one, we will trace out a cycle. If any index is not in the
cycle, it will create another cycle and both cycles will be disjoint. For
example the permutation

```
i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
p[i] 2 13 1 5 3 15 14 12 8 10 4 19 16 11 9 7 18 6 17 0
```

will create the following set of cycles:

So the idea is to resolve every cycle separately, by iterating over the indices and moving every element to the place it belongs:

```
func Cycles(vals, perm []int) {
for i := 0; i < len(vals); i++ {
v, j := vals[i], perm[i]
for j != i {
vals[j], v = v, vals[j]
perm[j], j = j, perm[j]
}
vals[i], perm[i] = v, i
}
}
```

This obviously only needs O(1) space. The secret, why it also only uses O(n)
time lies in the fact, that the inner loop will not be entered for elements,
that are already at the correct position. Thus this is (from a complexity
standpoint at least) the optimal solution to the problem, as it is impossible
to use *less* than linear time for applying a permutation.

There is still one small problem with this solution: It also sorts the
permutation array. We need this, to know when a position is already occupied by
it's final element. In our algorithm this is represented by the fact, that the
permutation is equal to it's index at that point. But really, it would be nice
if we could mark the index *without* losing the order of the permutation. But
that is not hard either - because every index is non-negative, we can
simply negate every index we are done with. This will make a negative index out
of it and we can check for that if we encounter it later and skip it in this
case. After we are done, we only need to take care to flip everything back and
all should be fine:

```
func NDCycles(vals, perm []int) {
for i := 0; i < len(vals); i++ {
if perm[i] < 0 {
// already correct - unmark and go on
// (note that ^a is the bitwise negation
perm[i] = ^perm[i]
continue
}
v, j := vals[i], perm[i]
for j != i {
vals[j], v = v, vals[j]
// When we find this element in the future, we must not swap it any
// further, so we mark it here
perm[j], j = ^perm[j], perm[j]
}
vals[i] = v
}
}
```

Here we only mark the elements we will again encounter in the *future*. The
current index will always be unmarked, once we are done with the outer loop.

I am aware, that this is technically cheating; This solution relies on the
fact, that the upper-most bit of the permutation elements won't ever be set.
Thus, we actually *do* have O(n) auxiliary space (as in n bit), because these
bits are not necessary for the algorithm. However, since it is pretty unlikely,
that we will find an architecture where this is not possible (and go guarantees
us that it actually is, because len(vals) is *always* signed, so we cant have
arrays that are big enough for the msb being set anyway), I think I am okay
with it ;)

I ran sum Benchmarks on this an these are the figures I came up with:

n | Naive | NDCycle | NDSort |
---|---|---|---|

10 | 332 ns | 130 ns | 1499 ns |

100 | 883 ns | 1019 ns | 27187 ns |

1000 | 15046 ns | 17978 ns | 473078 ns |

10000 | 81800 ns | 242121 ns | 4659433 ns |

I did not measure space-use. The time of NDCycle for 10000 elements seems
suspicious - while it is not surprising, that in general it takes more time
than the naive approach, due to it's complexity, this jump is unexpected. Maybe
if I have the time I will investigate this and also measure memory use. In the
meantime, I
uploaded all the
code used here, so you can try it out yourself. You can run it with ```
go run
perm.go
```

and run the benchmarks with `go test -bench Benchmark.*`

.