Amalegeni-Go > Concurrent SQL | |||||||||||
Amalegeni-Go | |||||||||||
Example 1 Example 2 |
Easy concurrent SQL with amalegeni-go and GoA nice application of amalegeni-go is that of easily writing and running concurrent sql code, like the following examples of sharded database queries. When making the step from single-database to sharded-databases, it always seemed so complicated and tedious, writing pages and pages of code to express even the simplest query. The solution presented here scratches that itch: just write of few lines of sql mixed in with go-code, and you're done. Amalegeni-go is putting the 'ad-hoc' back into the 'ad-hoc queries'. Who said that coding for a sharded database should be a complex and frustrating experience? The database setupThe setup here includes 16 shard databases plus 1 extra database, the aggregation database. The latter serves to store the intermediate results returned from the shards, that needs some further processing. ![]() Shard Diagram The convention used for numbering the databases is that the aggregation db is numbered 0, and the shards are numbered 1 .. 16. So in following code you'll notice that iterations don't start at 0 but at 1. For example:
The dataTo have a big enough dataset I loaded the nodes and tags of openstreetmap's "europe-latest.osm.bz2" file in a 16-shard setup. Each and every shard is a Postgresql instance running in its own Xen domU server. And all the virtual servers are running on 1 piece of hardware equipped with an 8-core (16-threads) CPU.
It's quite a blob of data, the file "europe-latest.osm.bz2" is close to 17GB and after unzipping it your harddisk is suddenly 210 GB smaller. This article is not about reading and loading the data, just assume it magically ended up in these tables in the databases:
Note: only the nodes with tags are loaded, the 'standalone' nodes are ignored. Count the records: example 1Before we start tallying, look at the first line of following to code, to spot how amalegeni-go handles the shards identification: just put one or more dbid arguments in your function argument list.
In the generated go-code you'll see that these dbid arguments are passed to the function that 'gets' the database connection. You'll also notice that these dbid parameters are of the type 'int'. It's also possible to use type 'dbidstring', for string identifiers for your shards. But because I love speed, I stick to the int type.
Now we just need to make a loop around calling the above "Select" and Bob's your uncle. This is the way that it can be done:
Hey, why aren't we checking the errors? No worries: each and every error is logged, and reported at the end of the run. More about this later. Compile the above (named "count.amg" in the downloadable zipfile) and run it. This is the output:
Okay, not bad, but now quit there yet, since the shards are addressed in a serial fashion, and we want to run concurrently of course! You can read more detail on this example, or carry on with the next bit... Count the records concurrently : example 2The query happens in two steps:
And finally print the results.
Running the serialized sql of example 1 takes about 28 seconds, while running the concurrent count of example 2 takes under 3 seconds. Pretty good improvement, I'd say! | ||||||||||
© Willem Moors, 2013 - 2020 |