Edit this page
(last edited November 19, 2009) Front Page | Recent Changes | Title Index | User Preferences | Random PageThis is a running log of events that might affect the Bren computing community11/19/2009 10:35
mail queues are running in real-time and operations should be as normal.
11/19/2009 8:15
our spam firewall failed under abnormally high messages delivery and queued about 10,000 valid emails. i have unstuck the firewall and messages are now flowing but it will likely take a few hours for the servers to catch up to incoming stream and be real-time. it appears that messages were all queued but the there may be a chance that some messages were lost. we are still investigating this issue and will post more information as we find it.
11/18/2009 11:20
the IC incoming smtp servers where bren email is process is misbehaving. this has resulted in some queued mail. resolution forthcoming within 30m
09/15/2009 10:50
routing appears to be restored. access to ic email is as normal.
09/15/2009 10:35
The core router in Physical Sciences North is down causing access to ic email to not be available. Status updates from the NOC at x2800
09/09/2009 9:09
mailman issues - the non-technical explanation is - the service that processed the mail for the mailing list server had a problem with some messages depending on how they were formatted. some mail was going through and other mail
was being held in limbo in a location where it normally would not be held. in looking at this issue we found the problem and resolved it thus releasing all the messages being held in limbo which were delivered.
12/06/2008 5:20
official update from campus network programmer below:
There was intermittent backbone network connectivity today, Dec. 6,
between approximately 15:15-16:45, due to a denial-of-service attack
originating from a campus department. The volume and nature of the
traffic proved disruptive to campus routing services, and we'll be
investigating potential mitigation measures for future incidents. The
offending host was blocked and campus services returned to normal.
12/06/2008 5:00
service has been restored for 20 mins. when more info is released from the campus network programmers i will include it here.
12/06/2008 4:20
the campus network programmers are working on the issue. you can call 893-2800 to hear the network status message. we are connected to the campus core via a switch in eng I ( 556-c.noc.ucsb.edu (556 ENGINEERING I)). status for core switches is available at https://noc.ucsb.edu/hobbit/NGB/core/core.html12/06/2008 3:20
the core campus network is currently having some hardware issues which is causing the connection from bren to the outside world to be unavailable. the connectivity has been up and down several times in the last 20 minutes. an update will be posted when the issue is resolved or more is know.
09/04/2008 9:00
due to the campus wide power outage scheduled sept. 13th and 14th all bren servers and services will NOT be available from saturday the 13th at 6pm until sunday the 14th at 6pm.
prior to leaving for the weekend on the 13th please shutdown your computer and power it back on when you return the following week. any computer or peripherals left powered on will not receive power during the outage. emergency electrical services will continue to operate during this period
if you have any further questions please let us know by replying to this email or calling us at x7794
08/14/2008 13:40
we have replaced a switch in the server room and services have been restored. if your computer is still having issues please reboot. if your problems persist please send email to request@bren.ucsb.edu or call x7794
08/14/2008 09:00
problems with a building switch have been causing intermittent connectivity, we are working with hp to resolve this issue
12/26/2007 16:00
the mail server is up and operational. some additional work will be done on 12/28
12/26/2007 7:20
mail server upgrades have begun
12/15/2007 9:00
the mail server is scheduled for upgrades on dec. 26th from 7am - 1pm
11/05/2007 13:20
the file server needs to have some work done this evening. we are hoping that the work will only take an hour or two but to be safe we scheduling from 11pm until 3am
please save your work and log off before 11 to ensure that your data is not lost.
11/03/2007 10:20
the file system check has finished and file server babylon is back on line. we are currently maxing capacity on some of our ups devices so we will continue to check on this over the weekend.
11/03/2007 9:40
we have a UPS in the server room that is having issues and has taken down the file server babylon and the web server. we are moving servers to new UPS devices. babylon is coming up dirty and needs to be checked before we let people access data.
05/08/2007 9:20
after the power outage one of the pieces of network hardware came back in such a state that it would periodically saturate the network with traffic causing congestion. this congestion was causing people to loose access to network shares (like your h:\ drive) and to email. we think we have isolated the issue and that bren computing services should be as they were before the power outage. as usual if you have computing issues please let us know by sending email to request@bren.ucsb.edu
many other departments on campus are still experiencing issues related to the power outage. if you have experiencing problems connecting to bren services from another department please contact your local departments computing staff for assistance.
05/08/2007 21:45
Power has been restored to bren and most of campus.
05/08/2007 19:20
There is a major power outage on campus that is impacting bren. Network and servers are still available from outside campus and we will continue to monitor the status.
we have isolated the hardware that has failed and have put a measures in place to bypass this hardware. we will be repairing this hardware shortly and will need to transition back to it at that point.
11/09/2006 9:00
we are trying to isolate the issue. in doing so access to the mail server will be sporadic as we move things around.
11/09/2006 6:00
we are working on the issue with icess. we are trying to determine if it is campus hardware or local hardware that is at fault.
11/09/2006 1:55
connectivity to the icess network where our mail and unix file servers are housed is down.
11/05/2006 5:55
hardware installed. snoopy is back up.
11/05/2006 11:05
Snoopy has a hardware failure. Parts may be here later today but most likely Monday am. If you need to use office apps remotely you can use woodstock.bren.ucsb.edu if you need to use science apps you will needs to use a lab computer.
8/30/2006 8:05
As most of you noticed the Bren mail server was not responsive for a good portion of the day on Tuesday 8/29. There were 2 issues that occurred around the same time. First the permission on the mail spools were accidentally changed causing the mail server to repeatedly ask for authentication when connecting to the server. We were able to resolve this issue but it then became apparent that the mail server was being flooded by mail from a compromised host on our wireless network. We stopped the ability for mail to be routed from the wireless network to the mail server but in the 45 minutes that it was routing mail it flooded the server with close to 60,000 messages. The mail server at that point needed to process these messages in the queue. It appears that no email messages were lost (with the exception of the messages from the compromised host that were discarded) and things should be back to "normal" now.
We monitor and patch all systems that are part of the esm (bren) domain but for laptops and computers that are connected via the wireless the best defense is to keep your PC up-to-date with patches and not to open any suspicious email or email attachments. We're sorry for the inconvenience and as always please feel free to contact request with any questions or concerns.
8/29/2006 19:35
the mail server is once again processing messages in real-time. no mail was discarded only delayed.
8/29/2006 14:35
updated time looks closer to 6 or 7. we will continue to monitor this throught the night.
8/29/2006 12:35
if the mail server continues to process items at the current rate the mail server will be processing items in real time around 4 or 5pm today.
8/29/2006 11:35
the mail server is pushing messages through as fast as it can. there is a back log of aprox. 50 thousand messages that it is plugging through. the back log was from a compromised host on our wireless network that was flooding our mail server with relayed messages.
8/29/2006 9:35
mail server has an extremely long queue. we are working to push messages through but the back up is signifigant.
5/29/2006 7:35
the work on the network and servers has been completed. all systems appear to be operating as normal. please send a request to request@bren.ucsb.edu if you find otherwise.
5/26/2006 10:35
i will be making some changes to the network and servers this monday from 1am until approximately 9am.
during that time services may be periodically available but to ensure data integrity please do not logon until the computing news page (http://www.bren.ucsb.edu/ow/Default.asp?BrenComputingNews) indicates that the maintenance is complete.
please log off when you leave your computer.
should you need to check new email or send an email the webmail service will be available as normal.
4/22/2006 15:35
parts came early. we are back up!!!4/22/2006 13:55
updated eta is 4PM
4/22/2006 07:55
parts are coming from vegas and the eta is 3:30 PM
4/22/2006 06:45
i am waiting on a call back from gate support about when the parts will be here.
4/22/2006 01:45
they are now saying that the parts in san jose are not the right one and they need to send parts from las vegas. they have an eta of 3:30 PM.
4/22/2006 00:30
back plane is coming from san jose. should be here before 7am (he says with fingers crossed)...and yes it is now officially my birthday.
i have taken down the terminal server snoopy and woodstock to prevent and user data corruption.
4/21/2006 23:30the back plane on babylon (the old mirror) server has failed. i am working with the vendor to get us a new one here asap. if they are not able to do this within a reasonable time i will replace it with a mirror.4/17/2006 10:30
We are still running off the mirror server at current time. Another mirror of lost cost disk has been built and will be tested as an alternate server later this week. We are still working with vendors on storage solutions that will put us back on faster disk.
4/11/2006 10:30
We are still running off the mirror server at current time. We are working with a number of vendors to provide a server and storage solution that will put us back on faster disk.
4/5/2006 9:30
Power has gone down again, both times unplanned. We are waiting word from FM on the status of the problem.
4/5/2006 9:00
Power has been resotred and the mail server is coming on-line. We will likely need to reboot a couple of times to re-attach storage nodes.
4/5/2006 7:30
Power outage in ellison has taken down all mail services.
4/2/2006 14:30
our main file server is not functioning and we are currently providing access to your data via a mirror of the data which is on slower disk. please be patient as we restore access to our primary server.
for those that want to know more details....after the planed maintenance this sunday the storage nodes on our main file server did not come back up with the proper configuration. working with the vendor and replacing numerous parts did bring the nodes back up but the data on the disks is not present. i am continuing to work with the vendor to remedy the situation but at this point a recovery of the data appears to be our only option. back-of-the-envelope estimates are 80 hours for full recovery from the mirror without users accessing the data. throttling the recover and allowing user access will take closer to double that time. good news is that the mirror, which we have never had to use, worked as designed and there was minimal downtime for access to user data.
4/2/2006 7:30
The server problems were a bit larger than just the power supply. The vendor has part coming here 4 hour which should fix the problem.
4/2/2006 6:30
Babylon scheduled maintenance starting.
2/24/2006 6:30
From what i can tell from here it looks like we have a circuit that has blown or one of the ups units has gone south. there a few servers down most of them not front line. we are looking into the problem and hope to have it resolved in the next hour or two.
1/29/2006 9:30
Babylon updates done.
1/29/2006 7:00
Babylon updates in progress.
1/18/2006 11:30
We think we have the problem with the mail scanner solved. Please let us know if you notice any strangeness.
1/18/2006 9:41
The virus scanner is now up and working. We are looking into what caused it to stop. We may need to take the service down to make changes throughout the morning.
1/18/2006 9:25
The virus scanner on the mail server is having issues. We are in the midst of reconfiguring it. Mail is being spooled and will be deilvered when the virus scanner is back on-line.
1/18/2006 8:05
The mail server is having some problems we are looking into it. Email mey be up and down for a bit.
1/11/2006 3:05
A node in our filesever was compromised and taken offline. The disruption was due to switching to the other node and powering down the compromised node. At current time there is no evidence of an external compromise vector so we are looking at both internal and external traffic to help us identify the extent of the problem. As usual if you notice any abnormal behavior please notify request@bren
1/9/2006 3:00
FM got the security system working again. Be sure to email request@bren if you notice any problems.
1/5/2006 5:00
FM is trying to get an engineer out from the hardware vendor to look at the problem.
1/3/2006 10:00
FM is still working on the issue.
12/30/2005 08:45
It looks like some part may need to be ordered and shipped. If this is the case a resolution will not be available unitl the parts arive
12/29/2005 19:15
After working with the vendor a solution has not been provided. They are going to try placing some other equipment on the security system and see if they can do some trouble shooting with it.
12/29/2005 11:15
The access control system for the lab wing is down. FM is working on the problem.
12/26/2005 10:15
Mailserver maintenance completed. If you notice any problems feel free to contact request@bren.ucsb.edu or use the webform for request located on the computing page http://www.bren.ucsb.edu/services/computing/index.php12/26/2005 08:00 - 15:00
Maintenance on the ICESS / Bren mailserver and the ICESS network
on Monday December 26th. You should plan on email (including webmail and
IMAP) being down from 8am to 3 PM (although I hope to be done by noon), and
ICESS and the UNIX network to be unstable during that time. In theory the
Bren Windows network will be unaffected, but there are no guarantees. If
there are any questions please feel free to contact request@bren.ucsb.edu.
11/20/2005 10:00
Mail server is back up. We are going over the logs and hope to have the issue identified.
11/20/2005 1:07
Mail server at ICESS is down. Webmail will NOT work. The mail server is having problems and has gone up and down a few times. We are looking at the problem and will have the system up asap.
10/19/2005 11:07
After extensive communication with the vendor, the file server is back up. You should have access to everything. If you are having problems, or your drives are not mapped, restart your computer and log in again. Email request@bren.ucsb.edu with any problems. Thank you for your patience during this unplanned outtage.
10/19/2005 08:55
The file server is down. You will not have access to H:\ including Outlook mail file. If you are already logged on, or at home, you can use webmail to send and receive new email. http://webmail.bren.ucsb.edu/ . We hope this will be resolved before noon.
10/06/2005 07:40
Snoopy is back up and the mail server is routing mail.
10/06/2005 04:35
Mail server at ICESS is down. Webmail will NOT work.
10/05/2005 17:45
Snoopy is down. I cannot get to it remotely to work on it, so it will be down until 7:30 am Thursday morning, at the earliest.
New mail may be accessed using webmail.
05/16/05 08:15 am
The unix file systems have been restored. All systems should be functioning as normal.
If you have drives that were not mapped (such as u:) please log off and back in to re-establish those connections.
05/15/05 10:45 am
The unix file server is having issues sharing file systems. Access to the U: drive is down as well as access to webmail.
04/01/05 10:30 am
This am there was a break in one of the fibers on the campus backbone causing a loss of network connectivity to Cheadle, Campbell, and Ellison (where our mail and unix file servers are). The break is fixed but may be up and down while they are testing the connection.
If you have drives that were not mapped (such as u:) please log off and back in to re-establish those connections.
04/01/05 9:00 am
There is a loss of connectivity on the campus backbone a fiber cut is suspected. Access to mail and unix file systems not available. no this is not an april fools joke02/26/05 1:00 pm
There only correlation I can find is that the SAV process runs at the same time every week. Stopped the SAV process. Will monitor this weekend.
02/26/05 1:00 pm
Babylon's cluster service had some issues and was not able to allow access to data disks via the normal shares. Restarting the service brought the shares back up. We will continue to monitor the server. Send mail to request if you have problems.
02/19/05 4:00 pm
Babylon's cluster service had some issues and was not able to allow access to data disks via the normal shares. Restarting the service brought the shares back up. We will continue to monitor the server. Send mail to request if you have problems.
02/17/05 11:00 am
New symantec client being installed to address the recent security warning regarding UPX Parsing Engine Flaw. You may get a notification that the definitions are out of date but in fact they are already in the process of being updated.
02/02/05 9:00 am
The NEW 4th floor office wing switch is up and running. The brains module (switching module) was not interacting with the back plane and was replaced by one of our shelf spares.
02/02/05 3:00 am
The 4th floor office wing switch is down. I tried replacing the power supply but it did not work will try some other options in the am.
01/31/05 8:00 am
Mail server operating as normal. Moving for the 1000Mbs network to the 100Mbs network seems to have stabalized the problem. We are continuing to monitor the server.
01/29/05 8:00 am
Mail server is experiencing connectivity issues. We are monitoring the problem.
The mail server has been up and down a number of times today causing periods where mail can not get in our out. All incoming mail will be resent or cached at the ucsb mail hub.
09/22/04 8:00 am
The GIS lab is open for computing again. The new systems are in and ready to go. Drop in and take a look.
09/17/04 8:00 am
We are having some problems with permissions sticking on profiles and user home directories. We are writing a script to batch change the permission to give users access to their home directories while preserving access users have granted to others.
09/15/04 1:00 pm
GIS lab computers and bulk buyin computers are done imaging. The lab is being setup and buyin computers will be distributed by mid next week.
09/13/04 1:00 pm
New computers for the GIS lab and bulk computer buyins arrived. The GIS lab will be used to stage the installation and configuration of these systems.
09/13/04 6:00 am
Bringing online all bren servers and networked devices.
09/12/04 3:00 pm
Power restored. All Bren/ICESS services will be restored early monday am.
09/12/04 1:00 pm
Estimate of 2-4 given for restoration of power.
09/12/04 8:50 am
Power outage has begun.
09/12/04 7:30 am
Power still out, they are now saying between 8 and 9.
09/10/04 10:30 am
We will migrate from the old storage and file server cluster this weekend downtime will start saturday night at 10 PM.
09/08/04 10:30 am
New storage and file server cluster is online. We are conducting some stress tests before we bring the system into production.
09/07/04 10:30 am
The email server has some problems now. We're looking into it right now and will let you know when things are back to normal.
09/01/04 10:00 am
After 24 hours of uptime the storage is not reporting any critical errors. We hope to have the storage long enough for us to migrate to the new servers and storage.
08/31/04 10:00 am
The storage is somewhat stable. There have been a few erros recoreded in the error log but dell tech support has no solutions at current time. We have parts on hand should there be any further problems. The new storage has arrived and a tech will be here friday to assist in the installation/configuration to ensure we have it up in a timely manner.
08/24/04 1:30 pm
The storage is back up and will hopefully be stable until we can migrate to our new storage. Latest update from gateway is that one of the parts in our new storage did not pass the inspection so a new part is being ordered. This may push delivery to Friday or early next week.
08/24/04 10:30 am
I am running a few more diagnostic tests which should tell us if the storage is stable enough to allow you all back on. I am hoping the diagnostic tests will finish around noon and I will know more then. The data are available and the storage has been stable since 5am. I am hoping to restore connectivity shortly after the tests are done.
08/24/04 9:00 am
The storage has been up and stable since 5am. We are looking at the remaining errors to decide if we need to replace any more parts before we let users back on. Looks like the end may be near.
08/24/04 6:30 am
On the phone with tech support. Trying to diagnose the errors that are still showing up in the logs. Hope to have the storage available to users this am. A full backup from 8/11 is being restored. We will make the data available so that users can recover any corrupted files on an as needed basis. No individual recoveries will be performed.
08/24/04 1:50 am
The second disk is done rebuilding. The third disk has started. I am still seeing some errors in the event logs, will talk to tech support in the am for more information.
08/23/04 9:20 pm
The first disk is done rebuilding. The second disk has started. Third disk will need to be replaced between 2 and 3 am.
08/23/04 5:20 pm
We have 3 disks to rebuild, each one will take 4-5 hours due to the volume of data on them. At that point the shares can can be recreated and the data will hopefully be available to users again.
side bar - our NEW storage should be here this thursday and hopefully be operational that day. Data will be migrated off of the current storage to the new devices. Please check back here for further updates.
08/23/04 3:50 pm
Parts have arrived, confirming plan of action with tech support.
08/23/04 1:40 pm
Dell is sending out another controller for us to try (different firware revision). We have also started recovering data from tape in the event that there is major file corruption. Due to the power outage the last date of backups is 8/11.
08/23/04 12:30 pm
I am now able to see the storage but I can not tell if any of the data has been corrupted as yet. I am working with tech support to get the system stable so that we can bring it online and copy the data to other hardware.
08/23/04 6:30 am
Back on the phone with tech support.
08/22/04 10:45 am
I am still not able to get the storage to be recognized by the server. I am off to be a groomsman in a wedding, will pick up the battle on monday.
08/21/04 8:45 pm
I have not been able to get the controllers talking to server. Dell is sending new controllers. From the led indicators on the front of the system it looks as if the disks are still being rebuilt.
08/21/04 2:45 pm
The controllers are offline but the disks still look like they are rebuilding, on the techs advise I am letting the process run until the activity has stopped. Parts are here to be replaced but the rebuild needs to finish before that happens.
08/21/04 11:00 am
Disks still rebuilding. Parts on the way. Will know more around 3.
08/21/04 11:00 am
Dell is sending more parts. Should be here around 3pm. Rebuild of the failed disks are still ongoing.
08/21/04 9:30 am
While in the middle of a rebuild another disk has failed causeing the cluster and access to be offline. On the phone with tech support.
08/20/04 9:40 pm
The first of the 3 new disks are in and rebuilding. Looks like it might take a bit longer than estimated. I will check the staus of the rebuild later tonight.
08/20/04 3:40 pm
Storage is still up. We are being shipped 3 new disks and some new cables. Should be here between 8 and 9. Drives need to be replaced one at a time and give time to rebuild data onto them. The whole procedure should take about 5 or 6 hours. Will keep you posted.
08/20/04 2:40 pm
Storage is back up. We are still trying to figure out why it went down last and how to prevent it from happening in the future.
08/20/04 1:20 pm
Parts came, were installed, did not fix the problem. On the phone with tech support.
08/20/04 11:20 am
Driver just called from oxnard, parts will be here within the hour.
08/20/04 9:35 am
Awaiting parts. Data integrity is good. Drive failovers worked as planned. Controllers still preventing us from access as normal.
08/20/04 8:45 am
We are awaiting more parts which will arrive by 1 pm.
08/20/04 7:25 am
On the phone with tech support. Storage failed at 6:55 am with disk error. No time estimate on when it will be back up.
08/20/04 7:25 am
The storage is down again. No access to H: drive. Use webmail. Will call tech support.
08/19/04 11:00 pm
After 6 hours on the phone with tech support the storage is back up and operating as normal. We will continue to monitor it through the night.
08/19/04 5:00 pm
The parts were replaced but there controller is still not operatinig as normal. On the phone with tech support... will keep you posted.
08/19/04 3:30 pm
This is a scheduled downtime to replace more parts on the file server. Do not log onto ESM on campus or on the terminal server.
08/19/04 1:30 am
The rebuild has finished all raid units running in a "normal" state.
A down time will be scheduled for later today to replace the memory on secondary controller and the matching memory on the primary controller.
08/18/04 10:30 pm
The correct parts showed up from Hollywood!!!
The storage is rebuilding from a degraded state. When that is stable we will put the memory into the secondary raid controller, reinitialize it and all should be somewhat normal again. I am typing this with my fingers crossed.
It looks like the raid rebuild will take about 2.5-3 hours so I will check back on it then and let you know where we are.
08/18/04 6:30 pm
Dell parts arrived....just not the parts we needed. On the phone with tech support now trying to get them to have someone drive one over from Bakersfield.
08/18/04 4:30 pm
Dell parts should be here at 6:22
We are keeping our fingers crossed that this will resolve the current problems we are having and leave the storage in a stable state.
Check back here from more information as it is available.
08/18/04 11:30 am
Bren storage is sporadic(no regular access to H: drive).
New email may be read and email may still be sent using http://webmail.bren.ucsb.edu
Contents of the storage on babylon (h drive and all other mapped drives) are intact, but due to issues with the harware, unavailable to users on the network.
If you need to store files you can use your u: drive
Parts will be here later today and installation should take about 1 hour 45 minutes. Check back here for more information as it is available.
Technical Information We are waiting on Dell for parts (cache memory) that will fix the secondary raid controller on our 660F. We have also lost a disk on each raid unit (660F and 224F) and are awaiting replacements. Currently the 224F is initializing to 99% and failing. We are also awaiting delivery of our new storage system from gateway which will replace the current storage system on babylon.
...will keep you posted.
We will be sure to let you know about any future unplanned outages well in advance -Bren Compute Team