Helos offline

Düben, Christian

24 Jul 2021 24 Jul '21

10:04 a.m.

Hey, I think Helos is offline. Is that intended? Kind regards, Christian Christian Düben Doctoral Candidate Chair of Macroeconomics Hamburg University Germany christian.dueben@uni-hamburg.de<mailto:christian.dueben@uni-hamburg.de> http://www.christian-dueben.com<http://www.christian-dueben.com/>

Attachments:

attachment.html (text/html — 3.0 KB)

Show replies by date

Thomas Krichel

24 Jul 24 Jul

10:05 a.m.

Düben, Christian writes

...

I think Helos is offline. Is that intended?

No! -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Thomas Krichel

10:10 a.m.

Thomas Krichel writes

...

Düben, Christian writes

...
I think Helos is offline. Is that intended?

No!

It can be pinged. I did not get any warnings. Cezar, please reboot Helos, 2a01:4f9:2b:276c::2. Kindly set a monitor for http://collec.repec.org with me and Christian as the recipient mails. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Cezar Lica

26 Jul 26 Jul

3:49 p.m.

I'm not able to set a monitoring just for both of you. But I can recommend uptimerobot if http/https calls is all you need. It's free and has lots of integrations. I don't even have access to helios (not that I'm complaining, it's ok that you guys manage your server :) ) On Sat, 24 Jul 2021 at 13:10, Thomas Krichel <krichel@openlib.org> wrote:

...

Thomas Krichel writes

...
Düben, Christian writes

...
I think Helos is offline. Is that intended?

No!

It can be pinged. I did not get any warnings.

Cezar, please reboot Helos, 2a01:4f9:2b:276c::2. Kindly set a monitor for

http://collec.repec.org

with me and Christian as the recipient mails.

--

Cheers,

Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Thomas Krichel

4:07 p.m.

Cezar Lica writes

...

I don't even have access to helios (not that I'm complaining, it's ok that you guys manage your server :) )

root@helos ~ # cat .ssh/authorized_keys | grep cezar ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAyx/bQCqKkfJ916S0XzZk75Vb2/vpK6BWXcaogN6kRU cezar@CLMB.local If that key is no longer good just let me know. I'm still grateful that you educated me about ed25519. I have been changing my keys to that type! -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Thomas Krichel

24 Jul 24 Jul

10:24 a.m.

Düben, Christian writes

...

I think Helos is offline. Is that intended?

Legacy CollEc is up. Something with the redirect does not work and I can't ssh into into the box to investigate. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Thomas Krichel

4:11 p.m.

Düben, Christian writes

...

Hey,

I think Helos is offline. Is that intended?

Thomas Krichel

4:27 p.m.

Cezar has reset helos, it's up again. Thanks to Cezar! Thomas Krichel writes

...

Düben, Christian writes

...
Hey,

I think Helos is offline. Is that intended?

On Skype, time UTC+7

| Cezar 19:32 | Hey | | Can it wait till Monday? | | | 23:09 | That would be really bad but what I can do about it.

I just came back from the beach.

--

Cheers,

Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

_______________________________________________ CollEc-run mailing list CollEc-run@lists.openlib.org http://lists.openlib.org/cgi-bin/mailman/listinfo/collec-run

-- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Thomas Krichel

4:53 p.m.

It looks like the syslog kept running, and so did apache but we could not get in via ssh. I'm at a loss here, and tired. It's close midnight and I have release at 7:00. Thomas Krichel writes

...

Düben, Christian writes

...
Hey,

I think Helos is offline. Is that intended?

On Skype, time UTC+7

| Cezar 19:32 | Hey | | Can it wait till Monday? | | | 23:09 | That would be really bad but what I can do about it.

I just came back from the beach.

--

Cheers,

Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

_______________________________________________ CollEc-run mailing list CollEc-run@lists.openlib.org http://lists.openlib.org/cgi-bin/mailman/listinfo/collec-run

-- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Düben, Christian

5:10 p.m.

I just logged in cleaned up the Docker containers. CollEc is up and running again. Thanks for fixing the server issues. Could the server have been compromised by an attack? Christian Düben Doctoral Candidate Chair of Macroeconomics Hamburg University Germany christian.dueben@uni-hamburg.de http://www.christian-dueben.com -----Original Message----- From: Thomas Krichel <krichel@openlib.org> Sent: Samstag, 24. Juli 2021 18:54 To: Düben, Christian <Christian.Dueben@uni-hamburg.de> Cc: CollEc Run <collec-run@lists.openlib.org> Subject: Re: [CollEc] Helos offline It looks like the syslog kept running, and so did apache but we could not get in via ssh. I'm at a loss here, and tired. It's close midnight and I have release at 7:00. Thomas Krichel writes

...

Düben, Christian writes

...
Hey,

I think Helos is offline. Is that intended?

On Skype, time UTC+7

| Cezar 19:32 | Hey | | Can it wait till Monday? | | | 23:09 | That would be really bad but what I can do about it.

I just came back from the beach.

--

Cheers,

Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

_______________________________________________ CollEc-run mailing list CollEc-run@lists.openlib.org http://lists.openlib.org/cgi-bin/mailman/listinfo/collec-run

-- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Thomas Krichel

5:15 p.m.

Düben, Christian writes

...

Thanks for fixing the server issues.

It was Cezar who rebooted.

...

Could the server have been compromised by an attack?

I don't have any evidence of that. A compromised machine would keep running to use it. But we need to update the software regularly to keep known bug out. Is there any update on an upgrade to focal or so? -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Düben, Christian

6:11 p.m.

The fact that the machine still runs bionic should not be a problem. Bionic is still supported. And I just checked the logs. Unattended upgrades is running. A problem might be that some updates require a reboot and we rarely reboot the machine. We could automate reboots using unattended upgrades or a cron job. By the way, I do not mind upgrading to focal. I am currently setting up a web app on an Ubuntu 20.04 machine for my new job at another university and there are no compatibility issues with the shiny app. At the beginning of June, I installed a script that records the times CollEc was accessed - no other variable, just the access time. When plotting the results aggregated by day, you can see that the number of daily app visits tends to fluctuate around 1,000 (see Subset.pdf). However, yesterday it surged to almost 30,000 (see Full_Period.pdf). Monit just notified me at 9:30 am today that the app was offline. So, I do not know whether that is related to the server issue. But tons of machines firing requests at port 80 on one day and the server becoming inaccessible on the next appears to be an odd coincidence. As I have to design a web app for my new job, I learned how to set up a Nginx web server with TLS certificates, set up the firewall etc. If you would be willing to drop Apache, I could install that on Helos as well. This is up to you. What we can also do, is connect the app to the web server via the loopback interface instead of the host. That is apparently more secure. I know how to do this in Nginx, but not in Apache. Christian Düben Doctoral Candidate Chair of Macroeconomics Hamburg University Germany christian.dueben@uni-hamburg.de http://www.christian-dueben.com -----Original Message----- From: Thomas Krichel <krichel@openlib.org> Sent: Samstag, 24. Juli 2021 19:15 To: Düben, Christian <Christian.Dueben@uni-hamburg.de> Cc: CollEc Run <collec-run@lists.openlib.org> Subject: Re: [CollEc] Helos offline Düben, Christian writes

...

Thanks for fixing the server issues.

It was Cezar who rebooted.

...

Could the server have been compromised by an attack?

Thomas Krichel

25 Jul 25 Jul

8:30 a.m.

Düben, Christian writes

...

By the way, I do not mind upgrading to focal. I am currently setting up a web app on an Ubuntu 20.04 machine for my new job at another university and there are no compatibility issues with the shiny app.

Let's move ahead with the most up-to-date version that you could do. I run debian testing everywhere. I can make myself smart enough to do the upgrade, but you let me know what version you can go with. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Thomas Krichel

10:27 a.m.

Düben, Christian writes

...

At the beginning of June, I installed a script that records the times CollEc was accessed - no other variable, just the access time. When plotting the results aggregated by day, you can see that the number of daily app visits tends to fluctuate around 1,000 (see Subset.pdf). However, yesterday it surged to almost 30,000 (see Full_Period.pdf). Monit just notified me at 9:30 am today that the app was offline. So, I do not know whether that is related to the server issue. But tons of machines firing requests at port 80 on one day and the server becoming inaccessible on the next appears to be an odd coincidence.

Well, if you just log the times, how can you claim it's "ton of machines"? I did go through the apache log, and the surge appears to come from indeed, a bunch of servers from Huawei's petalsearch. The requests look legit. I'm sure they use reasonable defaults. It just that the shinyapp is slow. Apache keeps saying 503 but keeps logging so it was still up. The odd thing is that we could not get through on the ssh. Since we only have that route to the server we are stuck, and have to ask for Cezar. There is a change for 502 to 503 114.119.158.156 - - [24/Jul/2021:09:04:30 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pel60%22 HTTP/1.1" 502 646 "-" "Mozilla/5.0 (Linux; Androi d 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" 114.119.136.243 - - [24/Jul/2021:09:04:39 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22ppa963%22 HTTP/1.1" 502 646 "-" "Mozilla/5.0 (Linux; Andro id 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" 114.119.134.212 - - [24/Jul/2021:09:04:43 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pkr268%22 HTTP/1.1" 503 575 "-" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" 114.119.146.29 - - [24/Jul/2021:09:04:43 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pbe625%22 HTTP/1.1" 503 575 "-" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" at 9:04 so that's pretty consistent with what you note. The non-accessibilty presumably has to do with helos running out of memory, but why did the oom killer not work? Well it run, but was not enough. We have in syslog root@helos /var/log # grep 'R invoked oom-killer' syslog.1 Jul 24 08:59:09 helos kernel: [14922235.506685] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 09:30:07 helos kernel: [14924093.497980] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 10:26:46 helos kernel: [14927492.848174] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 10:58:08 helos kernel: [14929347.932058] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 12:08:50 helos kernel: [14933616.461377] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 12:58:00 helos kernel: [14936548.248476] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 13:10:19 helos kernel: [14937294.624810] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 13:23:38 helos kernel: [14938104.947025] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 14:08:06 helos kernel: [14940762.579273] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 14:24:13 helos kernel: [14941739.313980] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 16:32:52 helos kernel: [14949437.368614] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 17:50:50 helos kernel: [14954122.341626] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 But seemingly these oom kill are not enough to keep ssh up. I suspect what could be done is a script that checks whether the uptime is greater than a day. In that case, grep for 'R invoked oom-killer' in syslog, if found, reboot. Run that every hour. I've never written / run anything like that. The easier thing is to disable petal via hosts.txt. Your thoughts? -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Thomas Krichel

10:39 a.m.

BTW, since the machine was still up, if I would have waited a few minutes, I might have been able to login. My bad. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Düben, Christian

11:47 a.m.

I guessed that it was a number of different machines because unless a machine deletes the cookies and cuts the web socket connection, it should only be counted once. I do not know why R invoked the oom killer. The app should not use a lot of memory. When something spawns 30,000 instances of the app, including 30,000 Docker containers, within a short time frame, that might cross a line though. After updating, I will install a script that logs memory use. We could block petalsearch for a few days. I do not mind that app store/ search engine browsing CollEc. However, the bot should send requests within a few sessions, not distribute them over 30,000 separate instances. If you upgrade to a newer Ubuntu version, I recommend the latest LTS version: 20.04.2. The intermediate releases, like 21.04, are only supported for a few months. If you want to go with Debian testing instead, I do not know which version to recommend. I do not know whether it also works for the type of machine that CollEc runs on, but Hetzner's (shared) cloud servers can be rebuilt through the cloud console on the Hetzner website. If you want a clean install that wipes the disk, that might be a preferred option. I upgraded a machine from Ubuntu 18.04 to 20.04 through the command line before. It was okay, but somehow it did not adequately connect the OS to repositories hosted by Hetzner itself. (ShinyProxy) shiny apps are probably not as robust as full-fledged NodeJS apps. But they are what I can write at this point and grant access to the graph theoretical methods that CollEc's data relies on. And looking at my current work load, I am fairly certain that I will not rewrite the app in another language this year. Christian Düben Doctoral Candidate Chair of Macroeconomics Hamburg University Germany christian.dueben@uni-hamburg.de http://www.christian-dueben.com -----Original Message----- From: Thomas Krichel <krichel@openlib.org> Sent: Sonntag, 25. Juli 2021 12:28 To: Düben, Christian <Christian.Dueben@uni-hamburg.de> Cc: CollEc Run <collec-run@lists.openlib.org>; Cezar Lica <cezar@symplectic.co.uk> Subject: Re: [CollEc] Helos offline Düben, Christian writes

...

At the beginning of June, I installed a script that records the times CollEc was accessed - no other variable, just the access time. When plotting the results aggregated by day, you can see that the number of daily app visits tends to fluctuate around 1,000 (see Subset.pdf). However, yesterday it surged to almost 30,000 (see Full_Period.pdf). Monit just notified me at 9:30 am today that the app was offline. So, I do not know whether that is related to the server issue. But tons of machines firing requests at port 80 on one day and the server becoming inaccessible on the next appears to be an odd coincidence.

Christian Zimmermann

1:24 p.m.

I see there is no http://collec.repec.org/robots.txt... Christian Zimmermann FIGUGEGL! Economic Research Federal Reserve Bank of St. Louis P.O. Box 442 St. Louis MO 63166-0442 USA https://ideas.repec.org/zimm/ @CZimm_economist On Sun, 25 Jul 2021, Thomas Krichel wrote:

...

Dï¿½ben, Christian writes

...
At the beginning of June, I installed a script that records the times CollEc was accessed - no other variable, just the access time. When plotting the results aggregated by day, you can see that the number of daily app visits tends to fluctuate around 1,000 (see Subset.pdf). However, yesterday it surged to almost 30,000 (see Full_Period.pdf). Monit just notified me at 9:30 am today that the app was offline. So, I do not know whether that is related to the server issue. But tons of machines firing requests at port 80 on one day and the server becoming inaccessible on the next appears to be an odd coincidence.

Well, if you just log the times, how can you claim it's "ton of machines"? I did go through the apache log, and the surge appears to come from indeed, a bunch of servers from Huawei's petalsearch. The requests look legit. I'm sure they use reasonable defaults. It just that the shinyapp is slow.

Apache keeps saying 503 but keeps logging so it was still up. The odd thing is that we could not get through on the ssh. Since we only have that route to the server we are stuck, and have to ask for Cezar.

There is a change for 502 to 503

114.119.158.156 - - [24/Jul/2021:09:04:30 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pel60%22 HTTP/1.1" 502 646 "-" "Mozilla/5.0 (Linux; Androi d 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" 114.119.136.243 - - [24/Jul/2021:09:04:39 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22ppa963%22 HTTP/1.1" 502 646 "-" "Mozilla/5.0 (Linux; Andro id 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" 114.119.134.212 - - [24/Jul/2021:09:04:43 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pkr268%22 HTTP/1.1" 503 575 "-" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" 114.119.146.29 - - [24/Jul/2021:09:04:43 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pbe625%22 HTTP/1.1" 503 575 "-" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"

at 9:04 so that's pretty consistent with what you note. The non-accessibilty presumably has to do with helos running out of memory, but why did the oom killer not work? Well it run, but was not enough. We have in syslog

root@helos /var/log # grep 'R invoked oom-killer' syslog.1 Jul 24 08:59:09 helos kernel: [14922235.506685] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 09:30:07 helos kernel: [14924093.497980] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 10:26:46 helos kernel: [14927492.848174] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 10:58:08 helos kernel: [14929347.932058] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 12:08:50 helos kernel: [14933616.461377] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 12:58:00 helos kernel: [14936548.248476] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 13:10:19 helos kernel: [14937294.624810] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 13:23:38 helos kernel: [14938104.947025] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 14:08:06 helos kernel: [14940762.579273] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 14:24:13 helos kernel: [14941739.313980] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 16:32:52 helos kernel: [14949437.368614] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 17:50:50 helos kernel: [14954122.341626] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0

But seemingly these oom kill are not enough to keep ssh up.

I suspect what could be done is a script that checks whether the uptime is greater than a day. In that case, grep for 'R invoked oom-killer' in syslog, if found, reboot. Run that every hour. I've never written / run anything like that.

The easier thing is to disable petal via hosts.txt.

Your thoughts?

--

Cheers,

Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

_______________________________________________ CollEc-run mailing list CollEc-run@lists.openlib.org http://lists.openlib.org/cgi-bin/mailman/listinfo/collec-run

Düben, Christian

2:27 p.m.

I did not find a way to add a robots.txt at the application level. We could add such a file at the web server level though. Christian Düben Doctoral Candidate Chair of Macroeconomics Hamburg University Germany christian.dueben@uni-hamburg.de http://www.christian-dueben.com -----Original Message----- From: Christian Zimmermann <zimmermann@stlouisfed.org> Sent: Sonntag, 25. Juli 2021 15:25 To: Thomas Krichel <krichel@openlib.org> Cc: Düben, Christian <Christian.Dueben@uni-hamburg.de>; CollEc Run <collec-run@lists.openlib.org> Subject: Re: [CollEc] Helos offline I see there is no http://collec.repec.org/robots.txt... Christian Zimmermann FIGUGEGL! Economic Research Federal Reserve Bank of St. Louis P.O. Box 442 St. Louis MO 63166-0442 USA https://ideas.repec.org/zimm/ @CZimm_economist On Sun, 25 Jul 2021, Thomas Krichel wrote:

...

Dï¿½ben, Christian writes

...
At the beginning of June, I installed a script that records the times CollEc was accessed - no other variable, just the access time. When plotting the results aggregated by day, you can see that the number of daily app visits tends to fluctuate around 1,000 (see Subset.pdf). However, yesterday it surged to almost 30,000 (see Full_Period.pdf). Monit just notified me at 9:30 am today that the app was offline. So, I do not know whether that is related to the server issue. But tons of machines firing requests at port 80 on one day and the server becoming inaccessible on the next appears to be an odd coincidence.

Well, if you just log the times, how can you claim it's "ton of machines"? I did go through the apache log, and the surge appears to come from indeed, a bunch of servers from Huawei's petalsearch. The requests look legit. I'm sure they use reasonable defaults. It just that the shinyapp is slow.

Apache keeps saying 503 but keeps logging so it was still up. The odd thing is that we could not get through on the ssh. Since we only have that route to the server we are stuck, and have to ask for Cezar.

There is a change for 502 to 503

114.119.158.156 - - [24/Jul/2021:09:04:30 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pel60%22 HTTP/1.1" 502 646 "-" "Mozilla/5.0 (Linux; Androi d 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" 114.119.136.243 - - [24/Jul/2021:09:04:39 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22ppa963%22 HTTP/1.1" 502 646 "-" "Mozilla/5.0 (Linux; Andro id 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" 114.119.134.212 - - [24/Jul/2021:09:04:43 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pkr268%22 HTTP/1.1" 503 575 "-" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" 114.119.146.29 - - [24/Jul/2021:09:04:43 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pbe625%22 HTTP/1.1" 503 575 "-" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"

at 9:04 so that's pretty consistent with what you note. The non-accessibilty presumably has to do with helos running out of memory, but why did the oom killer not work? Well it run, but was not enough. We have in syslog

root@helos /var/log # grep 'R invoked oom-killer' syslog.1 Jul 24 08:59:09 helos kernel: [14922235.506685] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 09:30:07 helos kernel: [14924093.497980] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 10:26:46 helos kernel: [14927492.848174] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 10:58:08 helos kernel: [14929347.932058] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 12:08:50 helos kernel: [14933616.461377] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 12:58:00 helos kernel: [14936548.248476] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 13:10:19 helos kernel: [14937294.624810] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 13:23:38 helos kernel: [14938104.947025] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 14:08:06 helos kernel: [14940762.579273] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 14:24:13 helos kernel: [14941739.313980] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 16:32:52 helos kernel: [14949437.368614] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jul 24 17:50:50 helos kernel: [14954122.341626] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0

But seemingly these oom kill are not enough to keep ssh up.

I suspect what could be done is a script that checks whether the uptime is greater than a day. In that case, grep for 'R invoked oom-killer' in syslog, if found, reboot. Run that every hour. I've never written / run anything like that.

The easier thing is to disable petal via hosts.txt.

Your thoughts?

--

Cheers,

Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

_______________________________________________ CollEc-run mailing list CollEc-run@lists.openlib.org http://lists.openlib.org/cgi-bin/mailman/listinfo/collec-run

Thomas Krichel

26 Jul 26 Jul

12:02 p.m.

Düben, Christian writes

...

As I have to design a web app for my new job, I learned how to set up a Nginx web server with TLS certificates, set up the firewall etc. If you would be willing to drop Apache, I could install that on Helos as well. This is up to you.

This looks doable, as there is no other web app apart collec and collec legacy running there. So ok with me.

...

What we can also do, is connect the app to the web server via the loopback interface instead of the host.

You mean a socket?

...

That is apparently more secure. I know how to do this in Nginx, but not in Apache.

Well we have a capacity problem, not a security issue. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Düben, Christian

12:53 p.m.

What I mean is to connect the containerized middleware ShinyProxy to the web server at 127.0.0.1:8080, instead of the current 0.0.0.0:8080. How shall we proceed? Shall we do a clean upgrade to Ubuntu 20.04 that wipes the disk? Or shall we just change the web server for now? Christian Düben Doctoral Candidate Chair of Macroeconomics Hamburg University Germany christian.dueben@uni-hamburg.de http://www.christian-dueben.com -----Original Message----- From: Thomas Krichel <krichel@openlib.org> Sent: Montag, 26. Juli 2021 14:02 To: Düben, Christian <Christian.Dueben@uni-hamburg.de> Cc: CollEc Run <collec-run@lists.openlib.org> Subject: Re: [CollEc] Helos offline Düben, Christian writes

...

As I have to design a web app for my new job, I learned how to set up a Nginx web server with TLS certificates, set up the firewall etc. If you would be willing to drop Apache, I could install that on Helos as well. This is up to you.

This looks doable, as there is no other web app apart collec and collec legacy running there. So ok with me.

...

What we can also do, is connect the app to the web server via the loopback interface instead of the host.

You mean a socket?

...

That is apparently more secure. I know how to do this in Nginx, but not in Apache.

Well we have a capacity problem, not a security issue. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Thomas Krichel

1:05 p.m.

Düben, Christian writes

...

What I mean is to connect the containerized middleware ShinyProxy to the web server at 127.0.0.1:8080, instead of the current 0.0.0.0:8080.

Sure ... why listen it 0.0.0.0:8080 anyway.

...

How shall we proceed? Shall we do a clean upgrade to Ubuntu 20.04 that wipes the disk? Or shall we just change the web server for now?

We can't wipe the disk. I have no space to first store the existing material that I have there. It's a RePEc-wide crisis. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Düben, Christian

1:17 p.m.

I cannot guarantee that none of the data would be lost, even with a command line-based upgrade to Ubuntu 20.04. So, I will just uninstall Apache and replace it with Nginx. Christian Düben Doctoral Candidate Chair of Macroeconomics Hamburg University Germany christian.dueben@uni-hamburg.de http://www.christian-dueben.com -----Original Message----- From: Thomas Krichel <krichel@openlib.org> Sent: Montag, 26. Juli 2021 15:05 To: Düben, Christian <Christian.Dueben@uni-hamburg.de> Cc: CollEc Run <collec-run@lists.openlib.org> Subject: Re: [CollEc] Helos offline Düben, Christian writes

...

What I mean is to connect the containerized middleware ShinyProxy to the web server at 127.0.0.1:8080, instead of the current 0.0.0.0:8080.

Sure ... why listen it 0.0.0.0:8080 anyway.

...

How shall we proceed? Shall we do a clean upgrade to Ubuntu 20.04 that wipes the disk? Or shall we just change the web server for now?

We can't wipe the disk. I have no space to first store the existing material that I have there. It's a RePEc-wide crisis. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Thomas Krichel

1:26 p.m.

Düben, Christian writes

...

I cannot guarantee that none of the data would be lost, even with a command line-based upgrade to Ubuntu 20.04.

A command-line upgrade should not wipe the disk.

...

So, I will just uninstall Apache and replace it with Nginx.

When you have done that I can take over the o/s upgrade. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Düben, Christian

1:58 p.m.

CollEc is up. You can do the OS upgrade now. Christian Düben Doctoral Candidate Chair of Macroeconomics Hamburg University Germany christian.dueben@uni-hamburg.de http://www.christian-dueben.com -----Original Message----- From: Thomas Krichel <krichel@openlib.org> Sent: Montag, 26. Juli 2021 15:26 To: Düben, Christian <Christian.Dueben@uni-hamburg.de> Cc: CollEc Run <collec-run@lists.openlib.org> Subject: Re: [CollEc] Helos offline Düben, Christian writes

...

I cannot guarantee that none of the data would be lost, even with a command line-based upgrade to Ubuntu 20.04.

A command-line upgrade should not wipe the disk.

...

So, I will just uninstall Apache and replace it with Nginx.

When you have done that I can take over the o/s upgrade. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Thomas Krichel

4:12 p.m.

Düben, Christian writes

...

CollEc is up. You can do the OS upgrade now.

Done. Reboot in progress... Here is the list of file it proposed to upgrade but that I kept at the current version /etc/monit/monitrc /etc/nginx/nginx.conf /etc/nginx/sites-available/default /etc/exim4/conf.d/acl/30_exim4-config_check_rcpt /etc/exim4/conf.d/acl/40_exim4-config_check_data /etc/exim4/exim4.conf.template /etc/ssh/ssh_config /etc/mysql/debian-start /etc/mysql/mariadb.conf.d/50-server.cnf -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel

Düben, Christian

4:27 p.m.

That should be ok. I cleaned up the Docker containers and the app works. The other processes seem to be in order. Christian Düben Doctoral Candidate Chair of Macroeconomics Hamburg University Germany christian.dueben@uni-hamburg.de http://www.christian-dueben.com -----Original Message----- From: Thomas Krichel <krichel@openlib.org> Sent: Montag, 26. Juli 2021 18:12 To: Düben, Christian <Christian.Dueben@uni-hamburg.de> Cc: CollEc Run <collec-run@lists.openlib.org> Subject: Re: [CollEc] Helos offline Düben, Christian writes

...

CollEc is up. You can do the OS upgrade now.

1607

Age (days ago)

1609

Last active (days ago)

List overview

Download

25 comments

4 participants

participants (4)

Cezar Lica
Christian Zimmermann
Düben, Christian
Thomas Krichel

Helos offline

Düben, Christian

Düben, Christian

Düben, Christian

Düben, Christian

Christian Zimmermann

Düben, Christian

Düben, Christian

Düben, Christian

Düben, Christian

Düben, Christian

tags

participants (4)