# The probability of finding a string of n characters

I tried to google this today but I only found a lot of unnecessarily complicated stuff. I wanted something simple but mathematically exact so I had to figure out this myself.

``` // p(n,m,b) - the probability of finding at least one occurence of a string of n characters (or digits/symbols)
// (not starting and ending with the same character) in a larger random string of m characters
// b - is the number of different types of characters
double p(int n, int m, int b) { return f(n,m,b) / Math.Pow(b,m); }
double f(int n, int m, int b) { return m<n ? 0 : b * f(n,m-1,b) + Math.Pow(b,m-n) - f(n,m-n,b); }```

Compare this with the standard function for calculating an approximate probability…

`double pa(int n, int m, int b) { return 1 - Math.Pow((Math.Pow(b,n)-1) / Math.Pow(b,n), m-n+1); }`

The f-function above calculates the total number of possible ways that the the sought for string can appear in the larger random string. Although it’s pretty short it’s not particularly fast. I made a much faster algorithm based on this idea…

For example: Let’s say you want to calculate the probability of finding a particular 5 digit number in a random sequence of 18 digits. We first use the f-function to calculate the number of possible ways it can occur:

The the probability is given by:

Here’s my faster algorithm

```double f(int n, int m, int b)
{
int nn = n; int sg = 1; int p = 0;
int q = 0; int a = 1; double sum = 0;
do {
q = m-nn;
p = q+a;
sum += sg * Combin(p,q) * Math.Pow(b,q);
sg = -sg;
a++;
nn += n;
} while (m-nn >= 0);
return sum;
}```

”Combin(p,q)” is of course the combination-function – ”p over q”.

Annonser