Shortest path between nodes

Written by Mark Samios 2/28/11 for more information visit demos.asymmetrics.com

The shortest path between nodes described here uses the depth search principle but with alterations to improve search performance. Using the straight algorithms for recursive searches involves considerable amounts of resources and time with large scale nodes networks. What we present here is a complete PHP class to minimize searches and search time reaching the shortest path possible.

If you want to see the application document click here.

The nodes PHP Class.

The functionality of the nodes class includes the following

Lets discuss each of these key elements of the our nodes search.

Recursive search

This is the main principle of a depth search. For each node the algorithm calls itself with each connected node as argument until a path is found or a limit is reached. The limit can be the number of recursions or the number of visited nodes thus far or some other mechanism.

Visited nodes tracking

In our nodes class a recursive limit is imposed in 2 ways. The visited nodes tracking is the first one. Previously visited nodes are separately stored. For each iteration and before another recursion takes place, the node is compared against the previously visited ones. If the node was previously visited is skipped. The visited nodes tracking occurs only for the current path scan and is not applicable to the global scope of a path search. Visited node tracking is also updated by navigation control parameters. In the included sample node, vertices are the navigation control parameters.

Visited nodes dynamic limit

Another key element of our nodes algorithm is the visited nodes dynamic limit. When the nodes class is constructed the user can feed directly the maximum number of visited nodes. When a path is found between two nodes the visited node limit is automatically modified. A new limit is set based on the shortest path already found. Another feature added to the scan algorithm of nodes is an adjacent node connect inhibitor taken from basic geometry and further explained in the adjacent nodes filtering. The dynamic control of visited nodes therefore reduces the number of future path scans as the algorithm will now abort based on the new limit set by the path found as well as the collection of.visited nodes.

Distance Breaks of current vs found paths

As the algorithm processes nodes it also keeps track of the total distance covered. Distance between nodes is pre-calculated when the nodes are first created or initially processed. A distance break occurs when the distance covered is equal or longer with the distance of the shortest path previously found. The distance break is implemented in two different places. During node iteration and during node recursion.

Navigation Control

There are navigation control parameters that are application dependent and can greatly enhance search performance. With the included sample the navigation parameters are node vertices. Node vertices offer two advantages. First node vertices can accurately calculate distances between nodes in a multi-dimensional environment and the distance between any node and the target node can be calculated at any time. The node search algorithm presented uses the distance between nodes as a decision making before executing the next iteration or recursion. In other words the next visited node can be based on a shortest distance criteria instead of a default arbitrary node order. When vertices are disabled the algorithm attempts to short non-visited nodes by distance so the closest nodes to the current one are processed first. This is done to increase the visited nodes storage as much as possible therefore increasing the chances of blocking nodes in the current path and reaching the target node faster than when having a random order for node iteration.

Dynamic shorting of nodes prior processing

As mentioned in the navigation control section, shorting of nodes can alter significantly the search process. Using application specific elements nodes can be shorted in a certain way, making the depth search extremely efficient. If we consider the worst case scenario where for 2 given nodes there is no path described, the dynamic shorting can still block visited nodes more efficiently than not having a shorting mechanism. For instance by visiting the closest to the current, nodes first the number of nodes we can reduce back-tracking. If we have nodes A,B,C connected to form a triangle and where sites AC > AB then the nodes will be processed in order A->B->C.

Dynamic removal of reversed node connects

When paths are found or when certain application elements are given the algorithm, we can remove reversed connections among nodes. In the first case when a path is found from nodes 1,2,3 where the dual connections exist among nodes the reversed nodes are removed. Afterwards the algorithm will have fewer node paths to consider. In addition for the second case, application specifics can be used to determine reversed node removal. If we use vertices to reach the target node based on a distance factor, "a moving closer to the target" outcome, from an iteration or recursion, may remove a reversed connection between 2 nodes, if one exists. Removal of reversed nodes also shows why node shorting methods are very important to be determined dynamically, during searches.

Adjacent node filtering

For any node its connections to other nodes are passed with the recursion process. Therefore the next recursive node scan, cannot access adjacent nodes of its parent. In other words given a triangle with nodes ABC the distances of AB+BC is always greater than AC. Therefore there is no point for the scan process of node B to attempt a scan via node C when the scan process is starts with node A and continuous with node B. This inhibit node process is based on geometry of triangles and shortens the node scan process and further explains the default nodes shorting by distance in the sample code..

Closest node to target tracking when no path is found

This feature requires at least one navigation control element to be in place and is application specific. With the sample code the use of vertices include the coordinates of the target node. Therefore during scanning the algorithm knows for any node visited the distance from the target node and stores it separately. When no navigation element is present there is no way to use the closest node to target node feature.

Dynamic optimization of paths by tracking visited node segments

As paths among nodes are discovered the algorithm attempts a path optimization technique. Given a set of found paths future searches in process attempt to optimize existing paths by keeping track of node segments. Node segments are valid connections among nodes during recursion as the search progresses visiting new nodes. For example for a search between nodes [ 1,5] if we have found a path among nodes [1,2,3,4,5] and our current node scan is 1,2,4 the path optimization routine will alter the found path to [1,2,4,5].

Tracking of alternative paths generation

In most cases the algorithm will not find the optimum path outright. Paths found are stored in two different areas. The main paths are stored with the final results while secondary or alternative paths are stored in a separate area. Some paths among nodes can be constructed on the fly given existing paths stored and node segments in process. Distance breaks may also generate alternative paths if the current node is close enough to the target node. The alternative paths generation offers several advantages to an application. For instance if a nodes in a path must be accessed by a single entity in a given time, alternative paths allows other entities to reach the target node at the same time. Because in the end paths found are utilized by the application itself. Following the depth search principle the nodes search algorithm will stop when all the possible paths are exhausted and so alternative paths will be likely present unless there is a significant difference between nodes and the first search pass is an optimum. However to force alternative paths generation the initialization routine allows controls of distance breaks at two different levels. If a distance break is disabled the search functions will continue the nodes scan without distance control thus allowing alternative paths generation.

Progressive removal and replacement of visited nodes belong to found paths

As each node search progresses, paths found are used to mark a global area of nodes visited. The global nodes visited area is different that the visited nodes used during recursion. It allows the algorithm to know what segments are a no-go during a specific node scan and abort the search quickly saving time and memory resources. In addition the global nodes visited area is also updated during node iteration of and recursion in order to recognize new paths, construct new segments. an reform existing nodes paths stored.

Automatic generation of random nodes

For test purposes the nodes class includes an automatic method to generate node maps. The node generation can be driven by vertices or by distance. When the nodes class is to process vertices, the vertices for each node is first generated, then a random connection to a set of other nodes is created and finally the distance between nodes is derived and placed together with the nodes map. When the node generation is driven by nodes distance only, the nodes are randomly generated and a random distance is set for each node connection. Depending on the application the random generation routine can be altered and tuned to match the specifics of the environment for testing purposes. With the included sample there is also a hard-coded set of nodes along with a diagram to for demonstration purposes.

File saving and loading of paths

Nodes can be saved to a file or restored from a file. Using a couple of helper functions the class provides methods to save or load node files. Node files can also include application specific elements. The sample includes code to also store and restore vertices of a 3-D environment.

Controls of debug, display and optimizations

The nodes class also includes display, debug and general controls that can alter its behavior based on the environment used. The display controls can show the task for each iteration or recursion the nodes class performs. The display controls can show detailed statistics including the nodes area, the final visited nodes area , number of revisits, recursions, stack level, path optimizations, distance breaks from paths and many other node scan parameters.

Implementations and discussion

You may wonder what's the relationship of a nodes topic with e-commerce and content management systems? What if you have only products and want to form categories based on product associations? The nodes class paradigm can be implemented with distances replaced by product relationships and with primary or alternative paths forming categories. What if a student or a junior employee has to to read a certain number of documents from a database where the relationship cannot be easily formed using disciplines and topics? Again a document relationship can form via sequence in a non-deterministic manner with a dedicated application of the nodes class. Possibilities are endless. And of course the nodes class can be used for maze solving, bot waypoints. chart navigation and multi-client environments.

In this topic we gave a fairly detailed description of the nodes class. The next topic demonstrates the use of the nodes class with a node sample and a diagram. Click here to see the sample application.

Shortest path between applied

Following the theory on the shortest path among nodes and in order to demonstrate the nodes class capabilities, we created a nodes diagram with 20 nodes connected.

Vertices and distances presets also show (figure-1).The connections and data of the nodes illustrated are fed into the nodes class so we can then try different source and target nodes and see how the nodes class behaves. Vertices are included with the sample but disabled for some tests to compare the results

Node Connections and paths

Figure-1 Sample of 20 node connected with vertices and distances.

Download code and document

Node tests and discussion

First test between nodes 1 to 3. Node 2 is the only node between 1 and 3 the nodes processing function will do the following:

Trying Path [1] via 2 to 3
Found Main Path [1,2,3]
Updating visited limit to 6
Trying Path [1] via 5 to 3
Distance Break d(7) <= d(13) setting visited from 5 to 4
Distance Break d(7) <= d(14) setting visited from 5 to 6
Distance Break d(7) <= d(18) setting visited from 5 to 1
Skipped 5 via 4 to 3 as visited
Skipped 5 via 1 to 3 as visited
Skipped 5 via 6 to 3 as visited
Blocking Path 1:[5]

For the 1,3 search the scan started by going through nodes 1 to 2 because 2 is the closest node to 3 by vertex. It is also by distance but as we have vertex control enabled the distance is not as important. Once the first path was found the visited limit was reduced and now the code searches for paths to be better than 2 nodes apart on the one hand and shorter than 7 in distance. Therefore when the search attempts to go from node 1 to node 5, a distance break aborts the search marking the nodes as visited because the distance between 1 to 5 is marked as 9. And 9 > 7 thus the abort.

The second test is between nodes 1 to 9 where the nodes processing function shows the following:

Trying Path [1] via 2 to 9
Trying Path [1,2] via 4 to 9
Found Main Path [1,2,4,9]
Updating visited limit to 7
Trying Path [1,2] via 3 to 9
Found Main Path [1,2,3,9]
Releasing Path 2:[4]
Blocking Path 1:[2]
Trying Path [1] via 5 to 9
Distance Break d(13) <= d(13) setting visited from 5 to 4
Distance Break d(13) <= d(14) setting visited from 5 to 6
Skipped 5 via 4 to 9 as visited
Skipped 5 via 6 to 9 as visited
Blocking Path 1:[5]
Releasing Path 1:[2]
Releasing Path 1:[5]

For the 1,9 search we found 2 different main paths. The second path was created because the total distance was less of the first. Lets give some explanation for the nodes sequence. Nodes 1 to 2 is visited first because node 2 is closer to node 9 comparing by vertex. Then node 4 is closer to node 9 again based on the vertex derived distance. From there the target node 9 is visible so the first path is found and it's stored. The visited nodes limit is also updated so we are now searching for paths having fewer or equal nodes and less distance to cover.

The process retracts back to node 2 based on depth search and tries node 3 because is the next closer node to 9 after checking the vertex difference. Node 3 is also connected to number 9 therefore another path is formed because here the total distance is less than the one the previous path.

The process retracts back to node 1 because there are no other nodes 2 is connected to. From node 1 it attempts a connection to node 5. Then from 5 to 4 and from 5 to 6. Both paths are discarded because of the distance break triggering. And there is no further processing because node 5 is completely processed and no other nodes are connected to node 1 once the process retracts back to the origin node.

Also note reversed node connections are not attempted for 2 reasons. They are adjacent to the current node in process and they can have the reverse node connection removed in the case a path is found or in the case the node processed is closer to the target node. Therefore during the nodes scan process connections from 2 to 1, 3 to 2, 4 to 2 and 9 to 3 are removed because of the paths found. Connections of 5 to 1 and of 4 to 5 are also removed because the parent node distance was greater than the node in process based on vertices calculations. This can give you a better understanding how the algorithm generates paths with other node combinations and why application elements introduced into the algorithm can greatly enhance performance.

Trying now with a complex example going from node 3 to node 8.

Trying Path [3] via 9 to 8
Trying Path [3,9] via 12 to 8
Skipped 12 via 9 to 8 as visited
Trying Path [3,9,12] via 19 to 8
Skipped 19 via 12 to 8 as visited
Trying Path [3,9,12,19] via 14 to 8
Skipped 14 via 19 to 8 as visited
Skipped 14 via 11 to 8 as visited
Trying Path [3,9,12,19,14] via 13 to 8
Skipped 13 via 11 to 8 as visited
Skipped 13 via 14 to 8 as visited
Blocking Path 14:[13]
Blocking Path 19:[14]
Trying Path [3,9,12,19] via 17 to 8
Skipped 17 via 19 to 8 as visited
Trying Path [3,9,12,19,17] via 18 to 8
Single Connect Advance 18:[16]
Found Main Path [3,9,12,19,17,18,16,8]
Updating visited limit to 11
Blocking Path 19:[17]
Blocking Path 12:[19]
Trying Path [3,9,12] via 11 to 8
Skipped 11 via 12 to 8 as visited
Skipped 11 via 10 to 8 as visited
Blocking Path 12:[11]
Blocking Path 9:[12]
Trying Path [3,9] via 4 to 8
Trying Path [3,9,4] via 15 to 8
Found Main Path [3,9,4,15,8]
Updating visited limit to 8
Skipped 4 via 9 to 8 as visited
Trying Path [3,9,4] via 5 to 8
Distance Break d(21) <= d(24) setting visited from 5 to 1
Skipped 5 via 4 to 8 as visited
Trying Path [3,9,4,5] via 6 to 8
Distance Break d(21) <= d(24) setting visited from 6 to 15
Distance Break d(21) <= d(25) setting visited from 6 to 20
Distance Break d(21) <= d(25) setting visited from 6 to 5
Skipped 6 via 15 to 8 as visited
Skipped 6 via 5 to 8 as visited
Skipped 6 via 20 to 8 as visited
Blocking Path 5:[6]
Skipped 5 via 1 to 8 as visited
Releasing Path 5:[6]
Blocking Path 4:[5]
Skipped 4 via 2 to 8 as visited
Blocking Path 9:[4]
Releasing Path 9:[12]
Blocking Path 3:[9]
Trying Path [3] via 10 to 8
Trying Path [3,10] via 11 to 8
Single Connect Advance 11:[12]
Blocking Path 10:[11]
Skipped 10 via 3 to 8 as visited
Releasing Path 10:[11]
Blocking Path 3:[10]
Trying Path [3] via 2 to 8
Trying Path [3,2] via 4 to 8
Single Connect Advance 4:[15]
Found Main Path [3,2,4,15,8]
Skipped 2 via 3 to 8 as visited
Trying Path [3,2] via 1 to 8
Trying Path [3,2,1] via 5 to 8
Distance Break d(18) <= d(20) setting visited from 5 to 4
Distance Break d(18) <= d(21) setting visited from 5 to 6
Skipped 5 via 4 to 8 as visited
Skipped 5 via 6 to 8 as visited
Blocking Path 1:[5]
Skipped 1 via 2 to 8 as visited
Releasing Path 1:[5]
Blocking Path 2:[1]
Blocking Path 3:[2]
Releasing Path 3:[9]

The first pass goes through nodes 3,9,12.. because node 9 then 12 are the nodes closer to the target node 8. From there the distance to the target node increases and so the reversed paths are not removed (eg: 19 to 12 connect stays) till the algorithm starts processing nodes from node 17 onwards. Again using the closer node approach the first path formed is 3,9,12,19,17,18,16,8. The process retracts to node 19 because node 16 is directly connected to the target node and the previous nodes 17,18 have no other connection and their reverse connects are also removed plus they are on the visited list of nodes.

Node segments 3,9,12 via 11 and 3,9,12,19 via 14 are also discarded as they form a closed circuit therefore eventually getting in the visited list leaving the search routine without exit point. Processing returns back to node 9. Using the closer to target approach the next pass goes through nodes 3,9,4,15 directly as all nodes in sequence bringing the search outcome closer to the target node because of the vertex distance calculations. The second path formed is 3,9,4,15,8 with less nodes and shorter distance than the first it also reduces the maximum limit of visited nodes.

Subsequent scans from node 9 are discarded because of distance breaks and the process is retracted to the origin node 3. Segment 3,10 is also discarded after scanning The next successful main node is node 2 where the process scans the surrounding nodes 3,2,4,15,8 is the next path which is the shortest found for this run.

The fourth example shows how a secondary path is formed. Source node is 11, target node is 9

Trying Path [11] via 12 to 9
Found Main Path [11,12,9]
Updating visited limit to 6
Trying Path [11] via 10 to 9
Trying Path [11,10] via 3 to 9
Distance Break as d(15) <= d(12) from 3 to 9
Found Secondary Path [11,10,3,9]
Skipped 10 via 11 to 9 as visited
Releasing Path 10:[3]
Blocking Path 11:[10]
Trying Path [11] via 14 to 9
Distance Break d(12) <= d(12) setting visited from 14 to 13
Skipped 14 via 11 to 9 as visited
Trying Path [11,14] via 19 to 9
Distance Break d(12) <= d(13) setting visited from 19 to 12
Distance Break d(12) <= d(14) setting visited from 19 to 14
Distance Break d(12) <= d(15) setting visited from 19 to 17
Skipped 19 via 12 to 9 as visited
Skipped 19 via 14 to 9 as visited
Skipped 19 via 17 to 9 as visited
Blocking Path 14:[19]
Skipped 14 via 13 to 9 as visited
Releasing Path 14:[19]
Blocking Path 11:[14]
Trying Path [11] via 13 to 9
Distance Break d(12) <= d(12) setting visited from 13 to 11
Distance Break d(12) <= d(14) setting visited from 13 to 14
Skipped 13 via 11 to 9 as visited
Skipped 13 via 14 to 9 as visited
Blocking Path 11:[13]
Releasing Path 11:[10]

So as we expected the nodes algorithm tries the closer node approach 11,12,9 and stores it as a main path. Then it tries 11,10,3 where it gets a distance break but the node 9 is accessible from the node where the distance break occurs. This case is stored separately as an alternative path because it is only a node apart from the optimum path thus far.

Lets now see how the path optimization works by trying to find paths between nodes 20 to 14

Trying Path [20,7,16,8,15,4,9,12] via 19 to 14
Found Main Path [20,7,16,8,15,4,9,12,19,14]
.....

Re-optimizing path [20,7,16,8,15,4,9,12,19,14] into [20,6,15,4,9,12,19,14]

In the process the nodes algorithm generates a main path 20,7,16,8,16,4,9,12,19,14. Later on the path optimization kicks in and reforms the path into 20,6,15,4,9,12,19,14. This happens because the current segment during the scan is set to 20,6,15 while in the main path node 15 is comes to play much later. So the new formed paths contains less nodes and a shorter distance. So the path is re-optimized removing redundant nodes but it still keeps a track of them in a separate global storage area to avoid re-processing. Reversed paths are also removed from all paths found where applicable ie: when a reversed node connection exists.

Using this PHP nodes class.

The sample code includes the nodes and vertices discussed in the examples here. The nodes constructor can be invoked with path and vertices arrays or when with a filename of nodes. The member function get_path accepts 2 parameters the source and target nodes. The get_result_string function retrieves the paths found while the get_stats shows statistics of the nodes scan. The short_nodes function will short the nodes by distance in this implementation.

Conclusions and further development

Additional application factors can be introduced or replace the vertices. If for instance speed is the main factor rather than vertices then the short_vertices function can be replaced or moved into a lower priority for ordering nodes.Implemented vertices in the sample have the z coordinate set always to 1 for simplicity. The random node generator gives a random value to all vertex coordinates.

The algorithm was tested with 100,000+ nodes each node having 4 connections to other nodes bringing results in msecs for valid routes. PHP version used during tests was 5.2.x but it should be compatible with older versions of PHP. Larger node sets can be accommodated via a database, to keep system resources low, than having complete node arrays loaded into memory. Path results can be retrieved on the client end via AJAX making calls to the server end keeping the class control on the server end for various types of applications requiring multiple clients. The visited limit can be adjusted by the user when the class is instantiated and can be set to a sensible value associated with the number of nodes in the map and average connections.

One of the areas that comes to mind for further development is a sequence of nodes as input to calculate paths. As the node algorithm stands right now, the input is always a source node and a target node, forming a sequence of 2 nodes.. If there is a need for higher number sequences the current method is to take pairs of nodes and then merge the final paths. Introducing sequences of 3 or more nodes as the main input to the nodes class may require different optimization techniques. For these reasons the path optimizations are controllable and can be switched off. Reverse connections among nodes can also taken out by commenting out couple of lines of code. Segment breaks can also be disabled and the visited nodes storage can be reset once the next target node in a sequence of goal nodes is reached. Overall the nodes class is structured in a general manner in order to accommodate different implementations and environments.

Different node maps were also tested having single and dual connects, open and closed node circuits where this algorithm performed efficiently. Different application factors were also implemented and for the most part the included distance factor had an average performance bringing satisfactory results against a time scale for large node sets of thousands of nodes. As previously mentioned different application factors can greatly alter performance. Distances mentioned here is a general factor and can be substituted based on application specifics. The nodes PHP class is released under GNU/GPL the class itself, this document and license details are also included with the implementation sample.

If you haven't see the shortest path theory behind the nodes class click here.