A quick update on the new CollEc: Setting up a database within a Docker container turned out to be a good idea. It kept crashing for days. And with Docker I could simply spin it up again without having to ask you to fix the main installation every time. The source of the crashes was the mismatch between MariaDB default settings and the size of the inserted data. After adjusting various settings, the containerized database is now stable. An issue that remains is the (LOAD DATA LOCAL INFILE) insert speed. Calculating four distance matrices of more than 2.2 billion cells each and writing them to disk does not take a lot of time. Loading them into the database, however, easily takes multiple days. MariaDB's column restriction requires the data to be inserted in long format, i.e. more than 8.8 billion rows. I am working on reducing that insert time to not more than a few hours. Using the distance matrices' symmetry, the zeros along the main diagonal and the unconnectedness of authors across subgraphs I cut that table length by more than half. Instead of N^{2} rows I now insert sum_{i} N_{i} (N_{i} - 1)/2 where N_{i} is the number of authors in graph i. This reduces the more than 2.2 billion rows to around 1.02 billion rows for each of the four transition functions. As this still takes a long time, I am testing further modifications to the database. There are various MariaDB system variables apart from the already modified ones (net_read_timeout, net_write_timeout, wait_timeout, innodb-fatal-semaphore-wait-threshold, max_allowed_packet, innodb-buffer-pool-size) for which I am yet to figure out the appropriate levels. Betweenness calculations now run in a manageable amount of time, but are only computed with three out of the four transition functions. The currently implemented exponential transition function generates edges weights small enough to crash the system when used in betweenness computations. The app does, therefore, not cover this combination. Addressing the database issues takes longer than I expected. You can test the app after I dealt with the performance bottlenecks. The app and the code generating the data once a day are ready to be deployed. Have a nice day. Kind regards, Christian Christian Düben Research Associate Chair of Macroeconomics Hamburg University Von-Melle-Park 5, Room 3102 20146 Hamburg Germany +49 40 42838 1898 christian.dueben@uni-hamburg.de http://www.christian-dueben.com -----Original Message----- From: CollEc-run <collec-run-bounces@lists.openlib.org> On Behalf Of Düben, Christian Sent: Mittwoch, 10. Juni 2020 11:24 To: Thomas Krichel <krichel@openlib.org> Cc: CollEc Run <collec-run@lists.openlib.org> Subject: Re: [CollEc] RePEc Visual I did indeed consider using the main installation. The container just turned out to be the easier solution because it automatically links the database to the other containers via the bridge network. Christian Düben Research Associate Chair of Macroeconomics Hamburg University Von-Melle-Park 5, Room 3102 20146 Hamburg Germany +49 40 42838 1898 christian.dueben@uni-hamburg.de http://www.christian-dueben.com -----Original Message----- From: Thomas Krichel <krichel@openlib.org> Sent: Mittwoch, 10. Juni 2020 11:14 To: Düben, Christian <Christian.Dueben@uni-hamburg.de> Cc: CollEc Run <collec-run@lists.openlib.org> Subject: Re: [CollEc] RePEc Visual Düben, Christian writes
Thanks. And sorry for breaking it in the first place. It should not happen again.
I now use a containerized MariaDB which the other containers can directly access through the bridge network.
This issue should not have prevented you from using the main installation. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel _______________________________________________ CollEc-run mailing list CollEc-run@lists.openlib.org http://lists.openlib.org/cgi-bin/mailman/listinfo/collec-run