Friday, February 25, 2011

Iterations based on feedback deepens your understanding

A few weeks ago I came across a "challenge" in the form of a competition from one of the JFokus exhibitors. It seemed fun and small enough for a spare-time project so I set out to solve it.

For those of you not reading Swedish I'll give you a translation of the problem statement:
"What should be done is to reduce a string as much as possible. You do this by repeatedly removing sub-strings. Not any sub-string, but only from a given set.
The string to reduce is:
VOCDIITEIOCRUDOIANTOCSLOIOCVESTAIOCVOLIOCENTSU
To reduce it you are allowed to remove occurrences of the following words:
TDD, DDD, DI, DO, OO, UI, ANT, CV, IOC, LOC, SU, VO

E.g. some repeated removals could be:
1) ...BIDDDOCDDD...  (remove DDD)
2) ...BIDDDOC...     (remove DDD again) 
3) ...BIOC...        (remove IOC) 
4) ...B...           and so on... 

What does the shortest string you can produce in this way look like?

To be a correct solution the program should be able to produce the shortest string given these in data and also for other start and reduction strings. In other words, it should be a general solution for this type of problem."


In an initial analysis session I managed to solve the given string by hand and thereby deduced an algorithm in three steps:
  • First split the start string based on "unreducible" chars, i.e. chars in the start string not part of any of the reduction strings.
  • Reduce each of the reducible sub-strings by recursive removal of reduction strings.
  • Concatenate the results into the shortest possible string.
Due to a limited (and unknown) amount of time to spend on the problem I decided to split the work into two iterations each resulting in a delivered solution to maximize my chances of being able to post at least some solution to the competition.

Iteration One
In the first iteration I decided to work on the second step of my deduced algorithm, i.e reduce the string by recursive removal of reduction strings, since it could stand as a solution on its own. Yes, this is definitely a brute-force, not so elegant, solution but never the less it solves the initially stated problem. I made the recursion happen inside a loop over all reduction strings to explore the shortest rest of applying the reduction strings in different order. From a performance point-of-view this pretty soon get out of control when the length of the start string and number of reduction strings grow, but for a iteration-one-solution I think it was OK.
Before packaging up my application I worked it over to ensure easily readable and neatly factored code guided by unit tests.

Iteration Two
Soon I got feedback that my solution did not solve all invariants of in-data used to evaluate submissions, i.e. it wasn't a good enough general solver. I supposed this was due to the severe performance problems kicking in for a bit longer start strings and higher number of reduction strings.
I quickly wrote some additional end-to-end unit tests with tougher in-data and got to work on adding implementation for the first and third steps of my initial algorithm. The well tested and factored code made it easy to add the new functionality. In addition I added caching to ensure that no sub-string is ever evaluated more than once. Running my end-to-end tests through the profiler tool provided evidence for a strong performance boost; the brute-force solution of iteration one did almost 140 000 recursions over the initially stated in-data to find the shortest string, now I was down to 92 recursions!
Architecturally-wise my solution wasn't fancy or innovative, I rather stayed with the OO-best-practices I normally use. The solution was implemented in Java with a simple interface describing the API for the logic.

public interface StringReducer {
    String reduce(String startString);
}

I now had three implementations of this interface (UnReducibleCharBasedStringReducer, RecursionBasedStringReducer and CachingStringReducer) that could be linked together to implement the complete algorithm.

Iteration Three
Again I got feedback that my solution did not solve all invariants of in-data used to evaluate submissions. At first I was a bit surprised since I really thought the solution was complete, but the feedback made me think harder (and open up another iteration) and soon I realized that my algorithm was lacking an important piece: For the general case, there is no guarantee that there will be a single string that is the shortest. It might very well be that the shortest result includes two or more strings with equal length. Consider for example the start string "ABC" and reduction strings "AB" and "BC". Reduction using "AB" gives the rest "C" while reduction using "BC" gives the rest "A", i.e. two different strings with equal length depending on the order in which the reduction strings are applied. So far my solution had randomly returned the latter of these results, omitting the first. Now I changed the StringReducer interface to include the notion of a Rest wrapping one or more strings with equal length. This required a pretty significant re-write of the implementations but the tests already in place guided the work and secured a high quality result.

In the process of re-write I also found another situation that could yield multiple results. Consider the start string "IOIOI" and the reduction string "IOI". This set-up could return two alternative results, "IO" and "OI", depending on which of the two occurrences of the reduction string that is removed first. Adding this to the RecursionBasedStringReducer was as easy as adding a new test and adding another loop in the recursive function.
With these modifications I submitted my solution for the third time, a couple of days before the dead-line of the competition.

Iteration Four
This time the feedback said that my submission was accepted and I waited for the jury to make their decision on the perceived best solution. I didn't win - but I followed the heated discussions taking off right after the jury had announced their decision. In this discussions several suggestions for hard-to-handle in-data were put forward. For each I added new tests to my solution and found that it could handle most of them. The exception was a rather long start string (not possible to split) with associated rather many reduction strings that could be used to reduce the entire start string. I'm pretty confident my solution eventually would have come to the right conclusion but the large number of reduction strings made the number of needed recursions huge. But of course the solution is to pay attention to the first reduction path returning an empty string. After that there is no need to continue searching since no string could possibly be shorter. Again adding a test and injecting a simple condition in the right placeRecursionBasedStringReducer was enough to solve the problem.
In doing so I also found a case of "false reduction strings", i.e. reduction strings containing characters not found in the start string. These strings will never be able to reduce the start string and hence can be removed right away instead of causing extra reduction paths to be evaluated. This finding caused another small update to the implementation and that's the current state of my StringReducer.

Conclusion
My take-away from solving this challenge is this real life example showing that working iteratively based on feedback helps you to arrive at a deeper understanding of the problem domain and a step-wise better solution, something we should always strive for in any project regardless of scope and domain.
In addition it was great fun!

No comments:

Post a Comment