wie lang wird distri wohl noch down sein?

tyco · 17. November 2005, 18:31:17 Uhr

Zitat von: gandal am 16. November 2005, 10:03:16 Uhr
Hauptsache, es gehen keine WU's verloren ...

Das haben sie bisher immer geschafft. WUs sind noch nie verloren gegangen. *klopfaufholz*

Es gibt Neuigkeiten: http://n0cgi.distributed.net/cgi/planarc.cgi?user=decibel&plan=2005-11-17.09:17

Es sieht so aus, als müssten wir noch einige Tage ohne Stats leben.

tyco · 20. November 2005, 17:24:13 Uhr

Scheint nicht so einfach zu sein. Aber anscheinend liegt es an der Controller-Karte. Hoffentlich kaufen sie eine neue und warten nicht auf Garantietausch.

Zitieren:: 19-Nov-2005 12:32 CST (Saturday) ::

We made good progress this morning in diagnosing the problems with the
stats server. As Decibel mentioned last night, we started seeing random
read errors when pulling data off the drives. Running a SHA1 or MD5 hash
off the PostgreSQL backup file (10GB) twice in a row would never yield
the same hash twice in a row. Quite creepy to see.

At first we thought we might be dealing with an OS issue, since we'd
taken this downtime as a good opportunity to upgrade the server from
FreeBSD 5.x to 6.0-STABLE, so we got a little sidetracked debugging
UFS2 and newfs options (which we'd also experimented with during the
restore). In that experimenting, Leto managed to ferret out a weird
bug in FreeBSD 6 where the system will panic if you copy a large
directory structure to a drive which has been tuned with a large
average filesize parameter. (Sent PR amd64/89202 to the FreeBSD team)

http://www.freebsd.org/cgi/query-pr.cgi?pr=amd64/89202

Once we moved past that, though, we were still facing the weird read
errors. This morning I nicked two drives out of the raid10 volume (which
was empty anyway) and plugged them in to a spare 9500S card that we've
got on hand. We're unable to repro the read errors off that card, which
would seem to indicate that the problem is indeed the old 3Ware 8506.

Sadly, the 9500S card is only the four port model, so we can't just
swap it in and start using it, we'll have to order a new card for
the stats server.

I'm quite encouraged that we seem to have isolated the problem to the
controller card. It's under warranty, but it's a depot repair and
the vendor won't just cross-ship us a replacement. We'll have to
order a new card if we want to get the server back up and running in
a reasonable amount of time.

http://n0cgi.distributed.net/cgi/planarc.cgi?user=nugget&plan=2005-11-19.12:32

tyco · 23. November 2005, 12:21:29 Uhr

Die Hoffnung stirbt zuletzt!

Es sieht so aus, dass wir bald wieder Stats haben.

Zitieren:: 22-Nov-2005 20:31 CST (Tuesday) ::

The new raid controller for statsbox arrived today (3Ware 9550SX-8) and
I've got it plugged up and running. Everything looks great so far,
although the "SX" series cards are a bit new for FreeBSD stable and we'll
have tapdance a bit on startup to get the proper twa driver loaded. I
see that the driver version we need was committed to FreeBSD current
about two weeks ago, so the awkwardness should be short-lived, I'd
expect an MFC into stable before too long.

The universe just keeps piling on, though, and one of the new 300GB
drives we bought died today while I was trying to initialize the
RAID10 volume. I ran to Fry's to pick up a new, new drive and this
one seems fine. Right now I'm working on moving the contents of the
200GB RAID1 system volume (the OS and home directories) onto a new
300GB mirror made from two of the new drives. This will give us an
extra 100GB to play around with in our home directories, which ought
to be nice. Once I've verified that the system volume has copied to
the 300GB drives I'll wipe the old ones and rebuild the RAID10
(database) volume from the six remaining 200GB drives.

I should have all that wrapped up by tomorrow, which means we'll be
in a position to restore the stats database backup and kick off the
catchup runs from all the keymaster log files that have been piling
up during this downtime.

Thanks again for your patience and understanding as we bring stats
back to life. Hopefully this means we'll have gotten the next few
years' worth of problems out of the way all in this one massive crash.

Moo.

http://n0cgi.distributed.net/cgi/planarc.cgi?user=nugget&plan=2005-11-22.20:31

tyco · 24. November 2005, 19:59:56 Uhr

Hoffen wir auf morgen!

Anscheinend lesen sie momentan die Stats ein bevor sie online gestellt werden.

Zitieren:: 23-Nov-2005 23:05 CST (Wednesday) ::

23:02 <+dctievent> (statsbox-iv/r72) Daily processing for 20051106 has
completed

As soon as fritz is moved back into a datacenter we should be all set. In the
meantime, it's playing catchup.

http://n0cgi.distributed.net/cgi/planarc.cgi?user=decibel&plan=2005-11-23.23:05

Edit: News von Floppus

ZitierenThe rerun is already in progress; the box is just not in a location with internet access atm. As soon as it's moved to the colo, it will be accessible

gandal · 28. November 2005, 09:37:48 Uhr

So, jetzt ist wenigstens schon wieder eine Anzeige da.

Dann schaut mal, ob auch alles angekommen ist.
Wirkt allerdings ein bischen langsamer ... wahrscheinlich der Andrang ...

tyco · 28. November 2005, 18:16:21 Uhr

Stats stehen seit heute morgen eingefroren auf den 23.11.05.

Allerdings tut sich jetzt was. Sie sind eben auf den 24.11. aktualisiert worden. Fehlen noch 3 Tage.

tyco · 28. November 2005, 20:18:59 Uhr

...nun sind die Stats vom 25.11. aktuell.

Ich denke wir haben berechtigte Hoffnung heute noch die aktuellen Stats vom 27.11. zu bewundern.

tyco · 28. November 2005, 22:25:13 Uhr

Zitieren:: 28-Nov-2005 13:29 CST (Monday) ::

In case anyone didn't notice... stats are back.

http://stats.distributed.net/team/tlist.php?project_id=8&low=1&limit=100

TMK · 28. November 2005, 23:49:03 Uhr

Juhuuu!

ernte23 · 29. November 2005, 02:15:00 Uhr

Zitat von: tyco am 28. November 2005, 20:18:59 Uhr

Ich denke wir haben berechtigte Hoffnung heute noch die aktuellen Stats vom 27.11. zu bewundern.

Habe gerade die vom 28.11. bewundert.

Alles da, keine einzige WU ging verloren

Gewinnbriefe · 30. November 2005, 06:20:22 Uhr

und wieder weg (gerade eben) ...
hoffentlich nur kurzfristig ...

tyco · 30. November 2005, 12:57:28 Uhr

Zitat von: Gewinnbriefe am 30. November 2005, 06:20:22 Uhr
und wieder weg (gerade eben) ...
hoffentlich nur kurzfristig ...

Die waren gestern schon den ganzen Tag verschwunden. Jetzt sind sie wieder online. Möglicherweise nur für kurze Dauer.

Es wird empfohlen momentan keine Änderungen am Team oder an Participant Informationen vorzunehmen. Die Änderungen könnten noch verloren gehen.

Zitieren:: 30-Nov-2005 00:21 CST (Wednesday) ::

Well... when it rains...

Nov 30 05:39:02 fritz kernel: twa0: INFO: (0x04: 0x000b): Rebuild started: unit=1
Nov 30 05:48:01 fritz kernel: twa0: ERROR: (0x04: 0x0026): Drive ECC error reported: port=5, unit=1
Nov 30 05:48:01 fritz kernel: twa0: ERROR: (0x04: 0x002d): Source drive error occurred: unit=1, port=5
Nov 30 05:48:01 fritz kernel: twa0: ERROR: (0x04: 0x0004): Rebuild failed: unit=1
Nov 30 05:48:01 fritz kernel: twa0: ERROR: (0x04: 0x0002): Degraded unit: unit=1, port=3
Nov 30 05:51:47 fritz kernel: twa0: INFO: (0x04: 0x000b): Rebuild started: unit=1

In plain english... another drive has failed. I've heard it's common for drives
from the same manufacturing run to all fail at the same time; I guess this is
proof.

I'm going to turn stats back on again, but I highly recommend you not make any
changes to team or participant information until this is all cleared up. It is
very possible that we will end up losing the entire array again, which right
now would mean reverting to a backup that could be days (or possibly even
weeks, depending on how long this takes).

We've already RMA'd 2 200G drives. Once those come back it shouldn't be much of
an issue for us to deal with drive failures, since we'll have some spares
on-hand. I'm also going to setup replication of critical data so that even if
we do lose the database again loss of user-modified data should be minimal.

Thanks for your patience.

http://n0cgi.distributed.net/cgi/dnet-finger.cgi?user=decibel

Keyfinder · 02. Dezember 2005, 19:21:02 Uhr

Hats keiner gemerkt?

Die stats laufen wieder!

zZt. hinken sie etwas hinterher, aber da tut sich was.

gandal · 02. Dezember 2005, 19:43:20 Uhr

Bleibt es auch so ?

Zwoa moi hob i heit ja scho gschaut ghabt. Hoff ma moi, des bleibt jetz a Zeidl so.

Gudi · 02. Dezember 2005, 20:20:43 Uhr

ne festplatte ist (zu allem überfluss) noch kaputt gegangen und läuft auch noch nicht wieder rund... Austauschplatten sind unterwegs, bis dahin kann es aber zu Instabilitäten kommen, auch Accountänderungen in diesen Tagen können nochmal verloren gehen (zB Teamjoins).

tyco · 02. Dezember 2005, 20:32:03 Uhr

Ein Update bei GHN wäre jetzt nicht schlecht.

http://www.rc5stats.de/ghnstats.php

ernte23 · 02. Dezember 2005, 23:04:22 Uhr

Zitat von: gandal am 02. Dezember 2005, 19:43:20 Uhr

Zwoa moi hob i heit ja scho gschaut ghabt. Hoff ma moi, des bleibt jetz a Zeidl so.

Gandal vagiss as, do hamma koa Chance, is wia´s is, Hauptsach wead mea amoi.

tyco · 02. Dezember 2005, 23:08:59 Uhr

Jo moi, was issn dess? Sin wa heua beim Team Bayern anlangt?

ernte23 · 02. Dezember 2005, 23:12:51 Uhr

Mia san Bayern, mia zwoa

gandal · 02. Dezember 2005, 23:33:52 Uhr

Zitat von: tyco am 02. Dezember 2005, 23:08:59 Uhr
Jo moi, was issn dess? Sin wa heua beim Team Bayern anlangt?

Sollen wir wechseln ?

wie lang wird distri wohl noch down sein?

17. November 2005, 18:31:17 Uhr #60

20. November 2005, 17:24:13 Uhr #61

23. November 2005, 12:21:29 Uhr #62

24. November 2005, 19:59:56 Uhr #63 Letzte Bearbeitung: 24. November 2005, 20:27:28 Uhr von tyco

28. November 2005, 09:37:48 Uhr #64

28. November 2005, 18:16:21 Uhr #65

28. November 2005, 20:18:59 Uhr #66

28. November 2005, 22:25:13 Uhr #67

28. November 2005, 23:49:03 Uhr #68

29. November 2005, 02:15:00 Uhr #69

30. November 2005, 06:20:22 Uhr #70

30. November 2005, 12:57:28 Uhr #71

02. Dezember 2005, 19:21:02 Uhr #72

02. Dezember 2005, 19:43:20 Uhr #73

02. Dezember 2005, 20:20:43 Uhr #74

02. Dezember 2005, 20:32:03 Uhr #75

02. Dezember 2005, 23:04:22 Uhr #76

02. Dezember 2005, 23:08:59 Uhr #77

02. Dezember 2005, 23:12:51 Uhr #78

02. Dezember 2005, 23:33:52 Uhr #79

Schnellantwort