Something I find myself doing often is trying to build a hierarchical data structure (tree) from a flat definition of some kind. An example is that I have a simple formatting language whose parser outputs to HTML, and it can handle nested lists. The list is just a flat piece of text, but to convert it into valid HTML, it’s easiest to convert it into a tree and recurse down it. This is one of those situations where the right choice of data structure makes the resulting code a lot cleaner and more direct. I’ve had to do it in a bunch of other situations too, so it seems to be a semi-common task and it’s not immediately obvious how best to do it (unless I missed that class).
The way which seems to work best is to use a stack. Iterate through each of your elements in your definition and create a node for each one. The node needs to keep a list of children (and whatever other information is relevant). It doesn’t need to keep track of its parent. As you’re going through, put each node on the stack. When the node is complete*, pop it and append it to the children of the node which is now on the top of the stack. The last node you pop before the stack is empty should be the root node of your tree. Simples. You’ve now got a nice tree you can recurse down or whatever.
- It can be easier to put a dummy root node on the stack first. This is so that you don’t have to worry about popping the root node and it going out of scope.
- At the end, you may find you still have a bunch of nodes still on the stack. Obviously you have to pop these and append them as children before the process is complete.
- At any one iteration, your data definition might be such that you need to pop more than one node.
*Ascertaining when a node is complete depends how your data is laid out, but let’s say you’re using Pythonesque left-leading indentation, a node is complete as soon as the indentation level of the element you’re looking at drops below the indentation level that corresponds to your node.