A lot of errors

Message boards : Number crunching : A lot of errors
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
boboviz

Send message
Joined: 19 May 20
Posts: 38
Credit: 45,765
RAC: 58
Message 602 - Posted: 16 Nov 2024, 15:40:25 UTC

Suddenly, a lot of errors in my machine:
195 (0x000000C3) EXIT_CHILD_FAILED

<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
16:36:30 (10667): wrapper (7.15.26016): starting
16:36:30 (10667): wrapper (7.15.26016): starting
16:36:30 (10667): wrapper: running ../../projects/gaiaathome.eu_gaiaathome/6_gaia@home[20241102.2234]_x86_64-pc-linux-gnu ()
16:36:31 (10667): 6_gaia@home[20241102.2234] exited; CPU time 0.003575
16:36:31 (10667): app exit status: 0x8b
16:36:31 (10667): called boinc_finish(195)
ID: 602 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 125
Credit: 888,695
RAC: 94
Message 603 - Posted: 16 Nov 2024, 20:13:55 UTC - in response to Message 602.  

Thanks ,

I will try fix this problem (job generator .....)
ID: 603 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 28 Sep 20
Posts: 13
Credit: 654,014
RAC: 386
Message 604 - Posted: 17 Nov 2024, 18:21:51 UTC - in response to Message 603.  

It seems you solved it already because tasks were failing very quickly (2 secs) and I have 10 in error and now I see I have 4 tasks crunching after more than 1 hour.
ID: 604 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sabroe_SMC

Send message
Joined: 10 Nov 19
Posts: 6
Credit: 1,001,141
RAC: 5,415
Message 606 - Posted: 18 Nov 2024, 12:43:05 UTC - in response to Message 604.  
Last modified: 18 Nov 2024, 13:20:01 UTC

Nope. He did not solved it. He only did "something"
I have a lot of tasks in error from today.

https://gaiaathome.eu/gaiaathome/results.php?hostid=6574&offset=0&show_names=0&state=6&appid=
ID: 606 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 125
Credit: 888,695
RAC: 94
Message 607 - Posted: 18 Nov 2024, 16:44:44 UTC - in response to Message 606.  

Some of the damaged tasks are in circulation, but new ones should no longer be produced ....
ID: 607 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sabroe_SMC

Send message
Joined: 10 Nov 19
Posts: 6
Credit: 1,001,141
RAC: 5,415
Message 608 - Posted: 18 Nov 2024, 18:32:06 UTC - in response to Message 607.  
Last modified: 18 Nov 2024, 18:32:58 UTC

Some of the damaged tasks are in circulation, but new ones should no longer be produced ....


I hope so
Have a good day
ID: 608 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 6 Nov 20
Posts: 5
Credit: 100,888
RAC: 1,445
Message 626 - Posted: 2 Dec 2024, 16:34:53 UTC

Suddenly seeing lots of errors on the tasks 4_Gaia@home v1.00 x86_64-pc-linux-gnu
"Exit status 195 (0x000000C3) EXIT_CHILD_FAILED"
https://gaiaathome.eu/gaiaathome/results.php?userid=5503&offset=0&show_names=0&state=6&appid=278
ID: 626 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 6 Nov 20
Posts: 5
Credit: 100,888
RAC: 1,445
Message 633 - Posted: 3 Dec 2024, 2:07:57 UTC - in response to Message 626.  

Suddenly seeing lots of errors on the tasks 4_Gaia@home v1.00 x86_64-pc-linux-gnu
"Exit status 195 (0x000000C3) EXIT_CHILD_FAILED"
https://gaiaathome.eu/gaiaathome/results.php?userid=5503&offset=0&show_names=0&state=6&appid=278

And still getting same error message. Up to 41 failed/error now.
ID: 633 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PDW

Send message
Joined: 21 Aug 19
Posts: 23
Credit: 49,635,024
RAC: 294,071
Message 634 - Posted: 3 Dec 2024, 6:42:31 UTC - in response to Message 633.  

Suddenly seeing lots of errors on the tasks 4_Gaia@home v1.00 x86_64-pc-linux-gnu
"Exit status 195 (0x000000C3) EXIT_CHILD_FAILED"
https://gaiaathome.eu/gaiaathome/results.php?userid=5503&offset=0&show_names=0&state=6&appid=278

And still getting same error message. Up to 41 failed/error now.

I'd suggest you regenerate !

Those tasks are fine on other hosts so it suggests it is a local problem, a project reset (losing current work) would get new executables.
ID: 634 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 21 Aug 19
Posts: 10
Credit: 651,237
RAC: 10,914
Message 649 - Posted: 3 Dec 2024, 21:00:03 UTC
Last modified: 3 Dec 2024, 21:30:32 UTC

<message>
process exited with code 195 (0xc3, -61) EXIT_CHILD_FAILED
</message>


This was the same issue seen with Gaia WU #4 last year when this machine received them.

I would sit there for over an hour, at a time, forcing updates and trying to get a full load of working 4_Gaia but only ended up with 10 hours of total CPU time before the WU generation ended.

It's happening again.

6_Gaia errors stopped on Nov 24 and were few.



Is there a library dependency issue? Does my client need additional library d/led?

ID: 649 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 21 Aug 19
Posts: 10
Credit: 651,237
RAC: 10,914
Message 650 - Posted: 3 Dec 2024, 21:08:11 UTC - in response to Message 634.  
Last modified: 3 Dec 2024, 21:24:10 UTC



Those tasks are fine on other hosts so it suggests it is a local problem, a project reset (losing current work) would get new executables.



I'll try that later, but 15 4_Gaia WU are actually running successfully ATM. Just took 100 failures to get these working WU's.

Does the project server not do version checking and automatically replace the obsolete version upon an update request?
ID: 650 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 21 Aug 19
Posts: 10
Credit: 651,237
RAC: 10,914
Message 653 - Posted: 4 Dec 2024, 3:06:29 UTC - in response to Message 650.  



Those tasks are fine on other hosts so it suggests it is a local problem, a project reset (losing current work) would get new executables.



I'll try that later, but 15 4_Gaia WU are actually running successfully ATM. Just took 100 failures to get these working WU's.

Does the project server not do version checking and automatically replace the obsolete version upon an update request?



This did not stop the errors.
Some units start fine.
ID: 653 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 21 Aug 19
Posts: 10
Credit: 651,237
RAC: 10,914
Message 659 - Posted: 5 Dec 2024, 6:01:46 UTC
Last modified: 5 Dec 2024, 6:12:17 UTC

NOTE: This machine ran virtually all 6_GAIA successfully.

There is a difference between successful standard error and unsuccessful on 4_GAIA.

Unsuccessful:

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
21:47:36 (23006): wrapper (7.15.26016): starting
21:47:36 (23006): wrapper (7.15.26016): starting
21:47:36 (23006): wrapper: running ../../projects/gaiaathome.eu_gaiaathome/4_gaia@home[20241130.0949]_x86_64-pc-linux-gnu ()
21:50:49 (23006): 4_gaia@home[20241130.0949] exited; CPU time 181.616648
21:50:49 (23006): app exit status: 0xb
21:50:49 (23006): called boinc_finish(195)

</stderr_txt>
]]>


Successful (has permission failures when trying to execute a rm command?)
<core_client_version>8.0.4</core_client_version>
<![CDATA[
<stderr_txt>
07:09:26 (8010): wrapper (7.15.26016): starting
07:09:26 (8010): wrapper (7.15.26016): starting
07:09:26 (8010): wrapper: running ../../projects/gaiaathome.eu_gaiaathome/4_gaia@home[20241130.0949]_x86_64-pc-linux-gnu ()
rm: cannot remove 'model.obj': No such file or directory
rm: cannot remove 'observations.dat': No such file or directory

09:06:31 (8010): 4_gaia@home[20241130.0949] exited; CPU time 6622.652592
09:06:31 (8010): called boinc_finish(0)

</stderr_txt>
]]>


Are these deletions occurring, leading to failed WU, while the successful WU's have permissions that disallow the deletions?
ID: 659 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 6 Nov 20
Posts: 5
Credit: 100,888
RAC: 1,445
Message 670 - Posted: 8 Dec 2024, 11:19:42 UTC - in response to Message 634.  

COLD REBOOT & PROJECT DETACH/REATTACH made no difference.
Still getting the -195 exit error ONLY on 4_Gaia@home tasks after a few seconds of trying to run.

Running Ubuntu 18.04.6 LTS [5.4.0-200-generic|libc 2.27]
ID: 670 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PDW

Send message
Joined: 21 Aug 19
Posts: 23
Credit: 49,635,024
RAC: 294,071
Message 674 - Posted: 8 Dec 2024, 16:46:03 UTC - in response to Message 670.  

COLD REBOOT & PROJECT DETACH/REATTACH made no difference.
Still getting the -195 exit error ONLY on 4_Gaia@home tasks after a few seconds of trying to run.

Running Ubuntu 18.04.6 LTS [5.4.0-200-generic|libc 2.27]

The boinc_finish(195) is not the real error.
The "app exit status: 0x8b" is the problem.

Your host got this for a task:
05:03:10 (30950): wrapper (7.15.26016): starting
05:03:10 (30950): wrapper (7.15.26016): starting
05:03:10 (30950): wrapper: running ../../projects/gaiaathome.eu_gaiaathome/4_gaia@home[20241130.0949]_x86_64-pc-linux-gnu ()
05:03:11 (30950): 4_gaia@home[20241130.0949] exited; CPU time 0.002265
05:03:11 (30950): app exit status: 0x8b
05:03:11 (30950): called boinc_finish(195)

The next host to complete the task got this:
14:55:39 (1649670): wrapper (7.15.26016): starting
14:55:39 (1649670): wrapper (7.15.26016): starting
14:55:39 (1649670): wrapper: running ../../projects/150.254.66.104_gaiaathome/4_gaia@home[20241130.0949]_x86_64-pc-linux-gnu ()
rm: cannot remove 'model.obj': No such file or directory
rm: cannot remove 'observations.dat': No such file or directory
15:18:14 (1649670): 4_gaia@home[20241130.0949] exited; CPU time 1326.155024
15:18:14 (1649670): called boinc_finish(0)

These were the recorded memory values for the completed task:
Peak working set size 223.45 MB
Peak swap size 45.67 GB
Peak disk usage 0.22 MB

The Peak swap size is enormous and I've no idea why it is that value, it certainly can't be using all that Swap space at once as ALL my systems would die if that was real given the number of tasks being run simultaneously.

Possibly the 4_gaia tasks ask for more initial memory than the 6_gaia tasks do and your host is not able to supply that amount.
Just a guess, you would need zupa to explain what the code is trying to do and why your host is failing all the 4_gaia tasks.

I would expect if you were missing libs or higher versions then the log would show that like it does for libquadmath.

Also, no idea what or why the "rm: cannot remove" errors appear in the completed tasks stderr log.
ID: 674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 22 Oct 24
Posts: 29
Credit: 1,186,797
RAC: 16,229
Message 676 - Posted: 8 Dec 2024, 18:14:54 UTC

Wrapper script is misconfigured by the work generator. Benign error. Nuisance is only the extra cluttering of the stderr.txt output file.
ID: 676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 6 Nov 20
Posts: 5
Credit: 100,888
RAC: 1,445
Message 677 - Posted: 17 Dec 2024, 13:16:38 UTC

@ADMINS ANY UPDATES ??
Still receiving same "-195 errors" with the _4 tasks.
ID: 677 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 125
Credit: 888,695
RAC: 94
Message 678 - Posted: 17 Dec 2024, 20:59:18 UTC - in response to Message 677.  

I have resalts for 99% jobs.
In system sometime jobs have error but next time is successfull....
ID: 678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 31 Oct 24
Posts: 2
Credit: 7,442,140
RAC: 300,450
Message 679 - Posted: 23 Dec 2024, 16:52:10 UTC - in response to Message 677.  

@ADMINS ANY UPDATES ??
Still receiving same "-195 errors" with the _4 tasks.


"195" error is a generic error from BOINC, not specifically in regard to the app.

your CPU is over 15 years old and lacks modern features such as SSE4 and AVX. this probably the reason they all fail.

the Gaia_5 and Gaia_6 apps probably do not require this, which is why they work.
ID: 679 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 26 Feb 20
Posts: 27
Credit: 6,887,052
RAC: 69,915
Message 682 - Posted: 24 Dec 2024, 15:44:41 UTC - in response to Message 679.  

@ADMINS ANY UPDATES ??
Still receiving same "-195 errors" with the _4 tasks.


"195" error is a generic error from BOINC, not specifically in regard to the app.

your CPU is over 15 years old and lacks modern features such as SSE4 and AVX. this probably the reason they all fail.

the Gaia_5 and Gaia_6 apps probably do not require this, which is why they work.


That could be an interesting research Project to be done here on the returns
ID: 682 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : A lot of errors

©2024 GAVIP-GC