Re: [CollEc] Website fail

15 Sep 2024

      Christian Düben writes
...
For performance reasons, threads write to their own files.
I am not sure what threads are and why we need them here.  All I
  need is to have the paths from one author to all others in a
  file. These can all be run in parallel. In your run, you seem to try
  to do all authors at the same time. This poses a great strain on the
  machine.  I suggest to calculate one author at a time, using
  parallel proccessig in a database on when author data has been
  changed.
...
This way, I can use parallelism without locks. If you prefer all
paths, distances, and closeness centrality values to respectively be
in single files instead of thread-specific files, I can change
that. However, that probably slows down the program's execution.
This massive parallel way of handling the job makes no sense to
  me.
...
All shortest paths within an author pair are not necessarily stored
consecutively. A paths file might contain the first shortest path
from author 1 to author 2, followed by the first shortest path from
author 1 to author 4, followed by the second shortest path from
author 1 to author 2. I can order them, if needed - again at a
performance penalty.
This makes no sense to me. This is not how I built the old
  CollEc. I ran a system that took nodes and updated them. Then
  I could run updates around the clock, and I can ran as many
  processess as I have machine capacity for. That is a
  completely different approach than what you try, which is
  to make a complete calculation every now and then.

  Now the machine is so slow that I can hardly use it. 

  It would be better to solve the task at hand, which is
  to create a fast program to do binary paths for an
  individual author. I can then take this up and try
  to rescuciate the old site.

-- 
  Written by Thomas Krichel http://openlib.org/home/krichel on his 21653rd day.