Christian Düben writes
Honestly, precomputing all shortest paths is a terrible idea. It is unnecessarily inefficient.
This is dependent on what we say the aim is. I always thought the aim is for folks to see the path. Here is your path so some other economist.
Centrality measures need to be computed beforehand, but paths should be derived during user sessions.
If you say so you must be right. But my design does not suit this thinking. It was build on the idea path first, centrality second. You think the opposite. I lack knowledge to ascertain whether my or your apprach is better. I suspect is is a matter of business case
All paths taken together occupy hundreds of GB on disk.
I am by no means a specialist in this, but the problem is not disk space. The problem is computing time.
The best way would to store the data in Neo4j and update the data base based on messages to an API.
I don't knwo what neo4j is but, yes, I think that is correct. We want to calculate new paths on demand when we think that something is changed. I am not a specialist in this area, that's why I used my admitingly primitive but robust approach. As I look at neo4j I see it's a commercial offering which is likely to lead to funding problems down the line.
But there is no API. CollEc's input is an xml file, which does not even come with a change log, just as the full data set.
If you say what the changelog should be we can build one.
I can reduce the number of threads, i.e. the number of workers running in parallel, if the load is too heavy. RAM utilization is already minimal. The new code is the most performant program any version of CollEc has ever seen.
Yes, it would need to run continously and write the paths in files per origin.
I have sacrificed multiple days to craft this piece of software exactly to your demands. You now have the binary paths for individual authors.
In a bunch of aggregates that are 100G each (?), which I then have to parse, but when?
You have distance values and you have closeness centrality results. Everything is stored in the requested antique output formats.
I can try write software that try to compile my path files from your output -- Written by Thomas Krichel http://openlib.org/home/krichel on his 21653rd day.