Q – What 3 Unix utilities would you take to a desert island, and why?
A: Find, Grep, hmm, gnuplot. Find and Grep are very capable, deep, and reliable. It would be easier not to have to re-do them myself, whereas sort, ls, df, even ps aren’t nearly as difficult to re-write myself.
Q: You have two strings, needle and haystack. Both are very long- haystack is at least as long as needle, could be longer. Potentially, millions of characters.
Q1: What’s the fastest way to determine whether all the characters in needle are found in haystack? Order doesn’t have to be the same, but if there are 27 ‘A’ characters in needle, we want to know whether there are at least 27 in haystack. And on through the rest of the, say, 8 bit character set.
A: Obviously, the trick here is to NOT do an N * N solution where each character in needle results in a read (or even a partial read) of haystack. Yet, one can’t avoid reading needle at least once. So: make an array with 256 elements, coresponding to character values 0 to 255. Initialize all array members to 0. Read through needle, incrementing the count for each character value when that character is encountered. For example: needle is “needle”. Character counts are:
Once needle has been completely read and counted, start reading haystack, and *decrement* the count (but stop at 0) for each character value whenever one of those characters appears. The read through haystack ends when either all the character counters are “0”, which is success, or when haystack is exhausted AND at least one character counter is greater than zero.
This solution is on the order of 2N + some constant, much better than the N * N obvious technique.
Q2: The A1 technique requires 256 counters of 20+ bits – most likely 32 bit integers. So it requires 1K bytes. How can the job be done with the very least amount of memory, but as long a run time as it takes?
A2: 2, 32 bit integers, and a character, are enough. One integer is an index (indice) into needle or haystack, one is a counter. The character is incremented through all possible character values. For each character value, the count is zeroed, then needle is stepped through, adding one to the counter for every instance of the target character code. After needle is completely counted, the same index is used to step through the locations in haystack, and the integer in the counter is decremented (provided it started off positive) any time the target character is found. But not below 0. If any trial gets to the end of haystack with a non-zero counter value, return a fail and give up. If the counter gets to zero while decrementing, look no further, increment the character, start on the next character.