Christian Düben writes
For performance reasons, threads write to their own files.
I am not sure what threads are and why we need them here. All I need is to have the paths from one author to all others in a file. These can all be run in parallel. In your run, you seem to try to do all authors at the same time. This poses a great strain on the machine. I suggest to calculate one author at a time, using parallel proccessig in a database on when author data has been changed.
This way, I can use parallelism without locks. If you prefer all paths, distances, and closeness centrality values to respectively be in single files instead of thread-specific files, I can change that. However, that probably slows down the program's execution.
This massive parallel way of handling the job makes no sense to me.
All shortest paths within an author pair are not necessarily stored consecutively. A paths file might contain the first shortest path from author 1 to author 2, followed by the first shortest path from author 1 to author 4, followed by the second shortest path from author 1 to author 2. I can order them, if needed - again at a performance penalty.
This makes no sense to me. This is not how I built the old CollEc. I ran a system that took nodes and updated them. Then I could run updates around the clock, and I can ran as many processess as I have machine capacity for. That is a completely different approach than what you try, which is to make a complete calculation every now and then. Now the machine is so slow that I can hardly use it. It would be better to solve the task at hand, which is to create a fast program to do binary paths for an individual author. I can then take this up and try to rescuciate the old site. -- Written by Thomas Krichel http://openlib.org/home/krichel on his 21653rd day.