Note: This content is accessible to all versions of every browser. However, this browser may not support basic Web standards, preventing the display of our site's design details. We support the mission of the Web Standards Project in the campaign encouraging users to upgrade their browsers.

Tobi Waves


INDEX | NOW | 2003|2004|2005 / 02|03|04|05|08|09|12 / 09|10|11|12|13|14|17|19|20

NordU 2003 Talk: Open Source at Turku City

Thursday, February 13, 2003 08:29 // Aros Congress Center, Västerås, Sweden // href

eye candy

by Eija Onnela eija.onnela@turku.fi

Turku is a city in Finland with 173'000 inhabitants, 13'000 city employees, 5000 workstations, 150 man/years spent on IT every year. 54 people in central IT.

Reasons for looking into OpenSource: a) OpenOffice in Finnish, b) new M$ licensing policy, c) report on usability of OpenOffice and Linux in Turku City http:///www.turku.fi/english/administration_economy/it_department.html

Test Setup

Test setups were created for Linux (1 person) and Windows (4 people sponsored by MS)

The goal of the upgrade is to simplify system management while keeping the user experience at a good level.

The Linux Software Environment is based on Suse using Webmin for administration and OpenAFS for home directory access. On the Application side they used OpenOffice and Netscape. Installation was done via CD because of the slow network environment. All running of a single Linux server.

The Windows Setup was done with RIS through PXE. Office on the Software side ... special application distributed through SMS. 9~Servers.

Problems

On the Linux side the problems were mostly because of interoperability with old office documents and the fact that the users are not used to the Linux environment and many small application which were not available on Linux.

On the Windows side, problems were mostly because of applications which were OK on Windows NT did not work on Windows XP.

There is a lot of resistance from the user side and from department admins regarding a switch to Linux. Users do not want to work with a different environment and local admins do not want to change to a centralized solution.

Conclusion

Two TCO analysis projects are still underway to determine the financial implications of the two solutions. A decision has not been reached yet on whether to go for Linux or Windows.

(www.turku.fi ...)

Eija is under quite a lot of pressure currently because all major cities in Finland are waiting on the outcome of the Turku project. And things are not looking all that good for linux because of missing applications and users as well as local admins stalling. Maybe Cytrix and terminal server can help.

 

NordU 2003 Talk: New Features in Solaris

Thursday, February 13, 2003 10:26 // Aros Congress Center, Västerås, Sweden // href

eye candy

by Richard McDougall r@sun.com

Reasons for Big Memory and thus 64bit Solaris

Machines with up to 500 GB of memory are possible. this opens new possibilities like for example keeping huge databases totally in memory and thus eliminating all the read performance problems on the file system level.

UFS in Solaris gt= 8

File creation is 10 times faster, file system creation is magnitudes faster, directory lookups scale linearly with directory size.

New Tools in Solaris 8

prstat (a better top), mdb (successor to adb and crash), lockstat -k (kernel profiling), kstat (command and perl library for kernel statistics), extended truss (traces library and program calls), new accounting system, cpustat for cache and bus statistics.

Solaris 9 Resource Management

The RM is a Infrastructure to automate performance management.

Traditionally machines had to be sized quite big because the workloads were very uneven. With RM it is possible to add workloads at a low priority and thus use all available CPU time without disturbing the main task on the machine.

RM allows to group processes into projects and assign resources to them and also do accounting on them. In /etc/project (or via LDAP/NIS+) you can define process groups by program, user and group and assign resources to each group. With the newtask command a program can be explicitly assigned to a certain project and thus gets access to the respective resources.

Resources are defined in pools which allow to select number of CPUs and the type of scheduler to be used. The projects are assigned to pools which then define the resources available to the processes in a project.

The resource constraints facility lets you send signals to programs violating resource limits or also deny them access to resources.

Many of the Solaris performance tools know about the projects concept and can report based on projects instead of processes.

Relevant commands projects, proj{add,mod,del), newtask, pooladm, poolcfg, poolbind.

Check (www.sun.com ...)

 

NordU 2003 Talk: GDB old Dog new Tricks

Thursday, February 13, 2003 11:11 // Aros Congress Center, Västerås, Sweden // href

eye candy

by Andrew Cagnery

Gdb is the most widely used debugger, only MS is still doing their own thing, most other companies have switched to gdb totally or are at least helping it succeed.

New Tricks

Languages C, C++, Java, Fortran, Scheme, Modula-2

Expression parser understands function expressions written in the language of the program and can evaluate them on the fly.

Remote debugging for debugging embedded systems remotely with gdb server.

Program tracing with trigger points to do on the fly monitoring without stopping the program.

Out in the next few weeks: tui the gdb split screen, curses based text gui.

The next version will know about multiple architectures. This means a single instance of gdb is able to remote debug code on different architectures. The eventual goal of this is to eventually be able to transparently step into remote procedure calls.

GDB is introducing a new interface called MI (machine interface) to simplify the use of gdb from front end programs. There are very strict criteria on changes to this interface to ensure that front-ends can rely on the stability of the interface.

Handle debugging of optimized code with CFI (gdb 6)

Old Dog

Gdb 1.x was out 1986 for SPARC, VAX, Tahoe and GOULD ... Andrew is looking for it.

Not really big new features since 1991, mainly new architectures were added. Almost any cpu ever designed is supported by gdb (and gcc).

Still the code base is growing exponentially they are at 1.5M lines now.

A few years back gdb supported 36 architectures. As this is difficult to maintain they have been actively eliminating old code ... they are down to 22 now.

Code Quality Improvements

Select -Werror fags for zero warning tolerance.

GNU Coding standard. ReIndent with GNU indent. Strict ISO C, Eliminate subjectivity. Use GNU indent and don't argue.

GDB specific lint which checks for various common problems. Code which does not pass is not accepted.

Move to opaque objects and avoid globals.

(www.gnu.org ...)

 

NordU 2003 Talk: The quest for the lost hardware docs

Thursday, February 13, 2003 11:58 // Aros Congress Center, Västerås, Sweden // href

eye candy

by Jes Sørensen jes@wildopensource.com

Jes has been working on the Linux kernel for the last 10 years. He is specializing on driver development. Currently he is involved with the consultancy Wildopensource.

The Basic Problem

Drivers needed to talk to hardware

Documentation lets us write better drivers, as we have to guess less. Not that documentation would generally really describe what the hardware does in reality, but it is a start.

Binary drivers are a problem. As kernel API changes without regard for binary compatibility with existing drivers.

Open code is generally better because of peer review.

Why are people not releasing specs?

Many companies think that they will loose competitive edge if the competitors know how to program their HW.

Jes makes a case that this is not true as giving programming information is not a real IP issue anymore as today the core of a companies IP is mostly within the chip and not in the interface.

Another problem is that many companies like nVidia for example, do not own all their interfaces, due to cross licensing and patenting issues and are thus not allowed to release source.

Convincing People to release their Specs

Having a driver in the official kernel gets it some automatic maintenance as the code is updated with the kernel, or at least compatibility problems will be discovered more quickly.

Better public acceptance due to a good image within the community and thus better sales.

Free help for debugging the hardware as external driver writers tend to find new problems.

Addressing Execs

Engineers are normally not a problem, they like to share information and help each other out within the limits of the environment they work in.

NDAs are acceptable if they just protect the documentation and not the code written based on the information gained from the docs.

GPL helps as it ensures that the source can not be taken by competitors and included in their closed product.

Flaming and yelling shuts doors. Good behavior helps. Don't use SlashDot.

Petitions might help, but only if the company is interested in this.

What todo when told NO?

Look for alternative vendors, OEMs might be more friendly. E.g. Broadcoms OEMs.

Sometimes new chips are largely based on the previous model and thus the interfaces are similar.

Use reverse engineering, but beware if you live in a non free country like the US where the DMCA can send you to jail for years. In the EU, currently interfaces can be legally reverse engineered for the purpose of interoperability if the vendor refuses to give specs. Also be aware that some countries believe to have jurisdiction everywhere.

Reverse Engineering

Take a close look at existing drivers for other OSes.

Snoop drivers' register access.

Use srandom to figure out the correct access sequence (Andrew Tridel of Samba Fame used this to figure out how the Vaio Picture Book Camera works) .

To avoid licensing issues, get a friend to read the specs and have him tell you how it works.

How to Write a Driver

Do not use a compatibility Layer. Write the driver Linux specific.

Examples

Jes has been using these techniques to write drivers for Alteon, Intel EEPro 100 and Broadcom 570x.

Alteon is no more, with Intel there are now excellent links and they even submit patches. Broadcom was really difficult to work with but due to pressure from their customers they seem to come around.

 

Ufff, NordU talk done

Thursday, February 13, 2003 15:08 // Aros Congress Center, Västerås, Sweden // href

I really love speaking in front of an audience. This is why it is so easy to convince me to come to conferences. During the last hour I finally had my own talk here at the NordU conference. I was talking about scalable system management concepts in a large environment. Presenting the major tools we have developed at the ISG.EE. There were not all that many people in my talk, but taking into account that only slightly more than 100 people at the conference and that there were 3 sessions in parallel plus a vendor exhibition I am actually quite happy. I think I drew over 30%.

Oh yea and I held the set time of 45 minutes exactly. I finished my talk 2 minutes before the alloted time with some break halfway through for questions. Now I just need to find a way to loose that adrenalin to be able to concentrate on other talks again.

 

NordU 2003 Talk: Linux on the Itanium

Thursday, February 13, 2003 15:24 // Aros Congress Center, Västerås, Sweden // href

eye candy

by Bruno Cornec from the HP/Intel Solution Center.

HP up to the management level is now taking Linux seriously. They finance most of the ia64 and wireless work. They employ several key Linux developers for example Jeremy Allison of Samba Fame.

Itanium is HPs future. All operating systems the users require will be provided. This includes Linux, Windows, HP-UX and OpwnVMS.

Itaniums are a new architecture co-developed between HP and IBM. It includes hardware IA32 emulation. The chip includes the Floating Point Unit from PA-RISC and is thus very fast in this area.

While Itanium is available to whoever wants to buy it from Intel, HP has developed their own high performance chip set for the Itanium 2 which they hope to gain competitive edge from.

HP is not only working on the ia64 architecture but also supporting ports to PA-RISC and Alpha.

HPs David Mossberger is responsible for the linux ia64 port. His main focus in doing the port is to comply with all the unix standards for 64 bit as well as keeping the ia64 port close to the ia32 version to ease portability. The ia64 port also includes access to the ia32 hardware emulator.

Several vendors already provide Itanium compatible products: Intels C Compiler and Oracle, Side Effects Houndini, MSC.Linux, MSC.Nastran, SCI, Quadrics drivers, Myrinet, SSI, Alinka.

HP is supporting external developers in improving the gcc code generation for the ia64 in order to get it on par with Intels compiler.

HP is working with INRIA on porting MandrakeCluster to the Itanium Platform. (clic.mandrakesoft.com ...)

Tips for porting to the Itanium

Alpha thing will just work.

Pointers and Longs are 64bit.

Big-endian is settable for certain programs as required.

Use int32_t, int64_t, u_int8_t

Compile with -Wall and take the warnings seriously.

 

NordU 2003 Talk: StarOffce and OpenOffice in Hanstholm

Thursday, February 13, 2003 16:28 // Aros Congress Center, Västerås, Sweden // href

eye candy

by Jens Ole Hald of Hanstholm City

Another City switching away from MS Office: Hanstholm in Danmark. Jens Ole Hald of the City IT department tells us how and why they did it.

A Testing Group of 15 Users has been evaluating StarOffice for 2 weeks in spring 2002. Since November 2002 there are 300 Employees working with Staroffice and OpenOffice.

Most problems were with reading Microsoft formats but even those were minor and got mostly fixed in the meantime. Some documents need minor re-formating when opened for the first time but this is not really a problem. Internal Problems with StarOffice were not found.

Users got a 3 hour up-lift course for StarOffice to make them ready for the new tool.

At the moment the Workstations are still running on Windows. But they are looking on moving over to Linux.

On the server side they want to stay with Novel. Quote "You have to know and do a lot to make a Windows or Unix box secure. About as much as you have todo and know in order to make a Novel box insecure."

To ease the transition for the Users, the local admins have produced templates and some custom icons and menus mimicking MS Office.

Reasons for changing

The reason for changing was primarily Microsofts new more expensive licensing scheme.

Hanstholm was already (or still) using terminal based programs. On IBM Mainframes and Unix Servers. They were mainly using Word and Excell from the Office Suite.

Unix was already deployed on the server in certain areas like Web and Proxy Servers.

Initiating the Transition

In summer all employees were invited to a presentation where the head of the cities administration introduced the new application and also made it clear that the decision to move to OpenOffice was taken and could not be changed. This set the tone so that the acceptance of the new program was very good and people were mostly interested in learning how to use the product and not in discussing if they want to use it.

Problems

Users who were very experienced with MS Excel had the most problems with the transition as things in the OpenOffice Spreadsheet are working slightly different. But then again it is probably mostly due to them not really accepting the change yet. They will now get a special 1 week introduction to OpenOffice.

 

NordU 2003 Talk: The Future of JVM performance and innovation

Thursday, February 13, 2003 17:16 // Aros Congress Center, Västerås, Sweden // href

eye candy

IBM has setup a special group concerned with improving the performance of Java. Robert F. Berry of IBM tells us of their efforts.

JVM innovation is manly driven by performance enhancements. It started out on the client side, but today Java is relay big on the server side.

Java performance on a specific hardware has developed into a major selling point.

Performance Improvements

In the memory management area, an enhanced fully threaded Marc/Sweep/Compact algorithm was developed which uses system idle time for marking and does incremental compaction.

IBMs Just in time Compiler (JIT) uses an aggressive in-lining technique which gives the jit much more code to look at and optimize. Object allocations can be improved by static analysis of their locality and then probably allocate them on the stack and thus also save on synchronization time.

Restarting a JVM is expensive, but from a transaction isolation point this is a useful concept. To make this a viable solution a JVM start and clean mechanism has been developed where several JVMs are sharing part of their environment. The startup time for an additional JVM has been reduced by about a magnitude.

Future Work

Footprint Size

Very Large Heaps gt 500 GB

Very Large Systems (n-Way Servers)

Object Pooling (e.g Jakarta Commons)

Improve decimal arithmetics for banking transactions

Improve performance on XML and XST workloads for Webservices

Conclusion

I find it rather hard to write a report on a topic I am not really fluent in :)

 

NordU 2003 Keynote: Talking to the Walls

Friday, February 14, 2003 08:32 // Aros Congress Center, Västerås, Sweden // href

eye candy

by Mark Burgess

The increasing availability of mobile communication devices changes our society. Appointments become fluid and can be changed up to the last minute as an SMS will inform the other party of the new location in space and time. People who are physically in the same room can easily avoid any communication with each other as they keep connected to their "friends" over their mobile devices.

On a sociological level the availability of mobile communication devices has not yet been integrated into the framework of our social standards. Where is it appropriate to use a mobile phone? Does the availability of a connection to your peer grant you permission to change appointment times at the last minute? Is it acceptable to have a mobile and turn it off?

Challenges for system management in this context are: Diversity as many different technologies will be around until (if) a unified standard emerges. Stability in the face of environmental noise.

We must find new ways of keeping the systems within our realm of responsibility in some organization. Firewalls make little sense in an environment with a wild mix of interconnected private and company devices. VPNs are giving a hint at things to come.

How can we find system management methodologies for diverse, mobile and changing device populations. Looking at natural behavior of birds (swarms) or ants (hives) give clues on how organization can work in such environments. Even today kids can be seen to swarm around town governed by SMS messages they exchange.

For system management this means that strict central control is a thing of the past. Maybe stable existing structures which allow for other devices to integrate them self will work. There is no point in trying to stop the advance of these new technologies. We rather have to integrate them into our environment and adapt the environments to them.

A secure system is one where the risks are known and have been deemed acceptable.

 

NordU 2003 Talk: Injecting RAS into Linux

Friday, February 14, 2003 10:33 // Aros Congress Center, Västerås, Sweden // href

eye candy

by Richard Moore of IBM

Richards group is occupied with getting RAS (Reliability, Availability and Serviceability) into Linux.

In order to get Linux established with previous "Big Iron" customers, a whole new set of requirements becomes important.

As Reliability is not really achievable, the aim is to reach pseudo reliability which means to hide failing elements of the system from other system components, probably taking a performance hit while doing so.

In a 2 CPU machine the failure of a single CPU can be recover gracefully by shifting the workload to the one working CPU.

The Serviceability component means that the system must have the means to detect failing components, best before they fail completely and then replace them. Serviceability is not limited to error detection but encompasses all elements which make a system serviceable. So this includes manuals, problem correlation, debugging tools, logging. Compared to what is available for IBM 390 the Linux offering is still in an embryonic state. Many tools are available, but they are often not yet integrated into the mainline kernel nor is there a consensus on which tools to use. The consensus is still Syslog which is not easily machine parseable and thus does not lend itself to automation.

The big advantage of Linux is that there is virtually no old code in the system compared to old systems like Windows or big iron machines which have a rich heritage of old code. Linux developers have a tendency to not shy away from ruthlessly eliminating bad code. They rather break an interface then keeping a bad one around. The effect is, that the systems are cleaner and easier to maintain.

The documentation problem gets solved in part by the much better code quality in Linux (due to peer review) and the extreme size of the kernel developer community. Also because source is available a lot of documentation is in the source itself.

Due to the versatile workloads of Linux systems all functionality in the serviceability must be tailored to the specification of the local setup. There is not much use in a mobile phone dumping core into its flash ram.

 

NordU 2003 Talk: FreeBSD 5.0

Friday, February 14, 2003 11:18 // Aros Congress Center, Västerås, Sweden // href

eye candy

Murry Stokley of FreeBSD Mall is one of the FreeBSD release engineers. In this talk he told about FreeBSD development, organization and what is new and cool in FreeBSD 5.0.

Development and release management

Everything is on CVS

There is a current branch and a stable branch. Material from the current branch gets merged back into the stable branch when they have gotten enough testing.

Over the last 12 months 160 people have committed code directly via cvs to the FreeBSD kernel. Non commiters are welcome to submit patches via the gnats bug tracking system.

FreeBSD is highly organized with elected leadership, developer documentation, release engineering, core team.

Tinderbox environments constantly test the current release.

Release 4.x remains supported in the foreseeable future as most FreeBSD sites have very high stability requirements.

New Features In 5.0

Support for kernel scheduled entities which leads to better threads

Device file system

Bluetooth and Firewire support

Mandatory Access Control

UFS2 with bigger inodes to store extended attributes

GEOM - modular disk I/O transformation framework

Device monitoring daemon devd to manage pcmcia and other plugable devices

Soft Updates (fs enhancement) with snapshots and background fsck

No more perl dependency in the base system

New platforms: ia64 and sparc64

More information is on (www.freebsd.org ...)

 

NordU 2003 Talk: Early Userspace

Friday, February 14, 2003 11:50 // Aros Congress Center, Västers, Sweden // href

eye candy

by H. Peter Alvin of Transmeta

As the Linux kernel developed, the root file system became more important and had to be able to live in more and more different places, like configurable locations on the disk or even on the network. Eventually even in RAM as initrd entered the scene. This caused all sorts of special cases needing handling to support all these variants. And all this is happening inside the kernel which is tough for development as testing is really painful. So the ideal case would be to be able to organize the booting process in user space.

The solution is to have a virtual root for the system, called rootfs using ramfs code. As the kernel starts, / is rootfs and the actual root filesystem becomes a simple overlay mount. This means that it is possible to use an initial ram disk and get rid of it later. As the kernel threads live in their own world (rootfs).

To simplify matters further, initrd gets replaced with initialramfs which is populated by loading one or several cpio archives. The cpio archives can come from the disk, from the net or even be compiled into the kernel itself. They provide the files required in early user space. To allow initialramfs to be small and still provide a useful environment, a special stripped down C library called klibc was developed to provide library code for this case.

Programming in this environment is almost as simple as normal userspace programming. Malloc works as expected, file and socket handling is there. Still there are restrictions, as all the rest of the system is not up yet. STDIO is available but it is very slow especially for reading, as klibc does not implement buffering to save space.

Candidates for early Userspace

With this infrastructure the possibilities become endless. The following candidates for early user space come to mind:

Partition detection

Determining the root filesystem type

Network booting

Caveat

Note that this is very much an ongoing development and only available in 2.5.x. (www.kernel.org ...) has more information.

 

My So-Called Life

Monday, February 17, 2003 00:15 // Feldstrasse, Aarburg, Switzerland // href

For the last few weeks Regula and I have been watching the 1994 drama series My So-Called Live. I totally love this show, too sad it got canceled after only 19 episodes. It's the story of 15 year old Angela Chase, her family and high school friends. I won't attempt to tell the story of the show as it is not really the story which makes it live. It's more the depiction of Angela's "so-called" life, as well as the lives of the people around her.

I seem to have a knack for getting hocked on TV Series that get canceled (Farscape is another example). Here I knew what was going to come, as I had bought the DVD box-set long after the fact. All the more amazing to see that fan community is still alive and kicking (www.mscl.com ...). Because the series ended so abruptly it offerd fertile ground for fan fiction, also called episode 20 fan fiction.

After watching the penultimate episode tonight I went on the net to research if the people behind MSCL had done other things I might want to see. The producers of the Series are Ed Zwick and Marshall Herskovitz of the Bedford Falls Production Company named after the setting of "It's A Wonderful Life". They started working together in 1983 and have since produced several award winning shows, unfortunately not all with great ratings. Some died an early death like MSCL. First there was the all around successful thirtysomething, then after MSCL (www.amazon.com ...) there was Relativity which also got rave reviews but low ratings. Now they are back with Once and Again (www.amazon.com ...) where ratings and reviews seem to be more in sync again.

Unfortunately there seem to be no DVDs of thirtysomething available, but the first season of Once and Again was just released on DVD, so there is at least something to console me when the last episode of MSCE will be watched tomorrow night.

I also found a lengthy article about Zwick and Herskovitz and Bedord Falls at (www.angelfire.com ...)

 

Windows Blues

Wednesday, February 19, 2003 00:01 // ETZ J97, ETH, Zurich // href

Our department is taking part in the ETH Laptop Project. This means, we are helping our students to make better use of their laptops. Currently this means we are developing a Linux and a Windows setup tailored to the requirements of our students. These setups will make it simple for them to integrate their laptops with our Unix Environment. We also have struck a deal with IBM which offers the students IBM Laptops at competitive prices and we will put our own Windows and Linux on these boxes.

Today I have been trying to get the IBM Windows XP installation which is already on the laptop when the students buy it, into a form so that it contains all the latest security stuff and fixes from MS and updates from IBM as well as our locally developed packages. When all the stuff was in, I used the sysprep tool to 'reseal' the machine, so that when the students boot it, it will come up with the usual short setup where the user can define the admin password and has to enter the serial number. Well, that was the plan at least. When I tried to reboot after the sysprep step, Windows came only halfway up and then complained, that setup could not continue because two processes were accessing the registry. BOOM. Reboot.

Over the course of the day I tried the whole spiel in many variations, searched the web, hunted through newsgroups. As every try took about 40 minutes, this problem was really painful to debug. Eventually and counter to all I expected, it finally worked. Unfortunately I had twisted and turned so many knobs that I am not sure which one was actually responsible for the sudden success. So tomorrow I will be at it again, trying to verify my recipe for success.

I so am glad that I can mostly work with Unix systems and only have to use windows occasionally. Whenever there is a problem with a windows box I feel like I was forced to wear a thick winter gloves while trying to repair a watch, blindfolded and someone occasionally moving the watch around.

But hey, I am stronger than windows! Eventually it sits up and begs for food, but the process always is extremely annoying.

 

Excessive Retransmits

Wednesday, February 19, 2003 22:31 // ETZ J97, ETH, Zurich, Switzerland // href

Today around 9am our main Solaris server started acting up. Its performance got patchy. We eventually found that it was suffering from excessive TCP retransmits of up to 1000%. This means that for each packet it sends out on the net it has to try 10 times until it is successful. This is an extremely hight value, or so Virtual Adrian tells us.

We started searching franticly for the reason of the problem, as performance on the server and even more on its clients was suffering badly. After about one hour of web hunting with and traffic dumping, we gave in to the pressure from the street and rebooted the beast, hoping that probably some internals of the kernel had been thrown out of whack and after a reboot all would be well. And indeed it was, at least for a few minutes. Then the server started misbehaving again, driving its TCP stack through the roof. As rebooting did not help, we went back to tcpdumping and etherealing. I did learn a lot about pcap filter syntax ...

'tcp[tcpflags] amp (tcp-rst) !0 ampamp tcp[tcpflags] amp (tcp-ack) =0'

but nothing about the reason for the retransmits. Fortunately, at this stage, the retransmit rate was not always at 1000% so work was possible for our users.

Then, in the early afternoon, Manuel found that the root disk of the server causing SCSI timeouts. As if we didn't have enough on our hands already. SCSI timeouts make the machine stop and wait for several seconds at a time. Together with the server, most people using its resources, were experience the same freezing problem on their workstation.

What a day. I have been writing emails about what was happening to our users all day long, but things were really stating to look bad. Our wonderful reputation for high quality service and superb uptime was going down the drain. It seemed though that most users were not blaming us at this stage, probably due to the fact that I kept them up to date with what was happening.

Around that time David found, that in the latest Solaris kernel patch there was a fix for some TCP stack issue which might be related to the retransmits we were still suffering from. He started to put in this patch so that we could activate it when we rebooted. This was going to be necessary anyway as I was preparing to replace the root disk with a fresh device.

Then, suddenly just minutes before the reboot, the server went back to normal, the retransmits were gone and performance was good again, no traces left.

So here I am, another day older and not much smarter about what was causing todays network problems. I can imagine things like that there is a bug in the Solaris TCP stack which can be triggered by a rouge packet and this would cause the symptoms we experienced today but I suspect, once the real reason is known, it will be way less spectacular.

 

NEWER | LONGER | SHORTER