I checked how fast CollEc computations run when executed in C and C++ through an R package. The underlying graph contained 47,192 authors who wrote at least one co-authored paper. I weighted the edges between co-authors by their number of joint papers. First, I calculated the distance matrix. Distances are measured as the length of Dijkstra's shortest cost paths. Calculating and writing those 2,227,084,864 cell values to disk took 4.77 minutes in a process parallelized across 8 cores. Computing each author's closeness value and writing it to disk took 4.27 minutes in an 8 core process. Betweenness is quite slow in comparison. The code still leaves space for improvement. All three measures are derived from shortest cost paths. So, it would be more efficient to derive those paths once and use them for all three measures rather than computing them thrice. Another point is the parallel process structure. The iterations' chunk size may not be optimal and could be improved through further tests. If users only access data for a small number of authors at once, it is not even necessary to previously calculate those values and store them on disk or on a SQL database server. With the graph kept in memory computations are quasi instant for small sets of authors. See you tomorrow. Christian Düben Research Associate Chair of Macroeconomics Hamburg University Von-Melle-Park 5, Room 3102 20146 Hamburg Germany +49 40 42838 1898 christian.dueben@uni-hamburg.de http://www.christian-dueben.com -----Ursprüngliche Nachricht----- Von: Thomas Krichel <krichel@openlib.org> Gesendet: Mittwoch, 20. Mai 2020 14:14 An: Düben, Christian <Christian.Dueben@uni-hamburg.de> Cc: CollEc Run <collec-run@lists.openlib.org> Betreff: Re: RePEc Visual Düben, Christian writes
I went through some of the files and checked what I would need for an extension of CollEc. I have a few ideas in mind on what to add and how to present it in an interactive application.
It's very hard to do a worse job than I did vizualizing that data!
When consulting our IT department here at Hamburg University, they suggested to host RePEc Visual on one of their managed Linux servers. At this point I am still waiting for the administration to process my application requesting such a server. And just like every administrative procedure at our institution, this takes a while. Once I have access to the respective infrastructure I am going to test implementations of RePEc Visual and potential CollEc extensions on it. Those applications would of course run under an external domain, not a Hamburg University domain.
We could run this on the existing CollEc server. This would be especially valuable if you manage to find a way to run the calculations faster. At this time, it's dreadfully slow. You could just take over the whole thing, well almost. We need to keep the mention of the sponsor, and I'd like to be aknowledged as the orginal creator.
I do not have Telegram and apparently do not have the correct login credentials for the Skype setup on my office Laptop. Do you use Zoom? If you do, I can send you a meeting link. If you do not, I will try to find out what login credentials our IT set for Skype.
Zoom should be fine. I'm in UTC+7. I can do late evenings no problem. My schedule is completely open. Maybe someone else would want to attend? I copy CollEc-run. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel