NFS related news
Automated Installer from media
Always Bet AGAINST Your Employees
Recently a group of NetApp managers tried to motivate their employees by betting that the team could meet a particular project milestone. If the team missed the date, the managers promised to dye their hair bright colors. (Silly hair bets have a long history at NetApp.)
Unfortunately, the team missed the date, and as a result, the managers not only had to deal with bad news, but they had brightly colored hair to advertise it. When people would ask, “What happened to your hair,” the managers had to say, “I bet my team that they would hit their schedule, but they missed.”
Even if there are perfectly good reasons for the miss, the listener might still conclude: You not only look like a clown, but you manage a whole team of clowns.
Much better to bet that your team will fail. When I first started running engineering (back in 1999), product reliability was an issue, so I bet my employees that they could not improve quality by a factor of 10. If they did, I would dye my hair any color that they wanted.
Imagine how this played out. The team succeeded (as measured by a man who cared so deeply about quality that I knew he would never cheat), and I ended up with hair that was bright magenta, blue, and red. To get hair these colors you can't put it on top of brown; you have to start by bleaching to white. So I looked like a freak, but at least, when people asked me about it, I had a good answer: “Let me tell you about my awesome team. I should never have bet against them.”
The conclusion: You
look like a clown, but at least you manage a good team.
So I say, always bet against your employees. Of course, it helps if they understand it’s a bet you want to lose.
Why NetApp Is Winning
NetApp recently released financial results, and according to one analyst, “It was a supersonic quarter.” Never mind the financial details – I think the more interesting question is: What’s going on? Why is NetApp Winning? (Those who like details can check here. The quick summary is that our revenue was flat from a year ago, up 9% from the previous quarter. Profitability has also recovered.)
Two interesting trends are combining to boost our growth. (1) Many companies have an aging IT infrastructure and can no longer postpone capital purchases. (2) Server virtualization forces companies to consider new IT architectures, and that plays well to NetApp’s strengths.
When the dot-com crash hit, back in 2000/2001, many companies had recently updated their IT infrastructure as part of Y2K remediation projects. Internet companies and their suppliers had been growing so quickly that most of their equipment was new, and many of them went out of business so the market was flooded with “almost new” gear. It was a tough recovery for IT vendors. This time is different. This downturn began gradually, perhaps in 2007, so people have already been postponing capital purchases for quite some time. Partly this is intuition, based on talking with customers, but one metric we can track is one-year service renewals. In good times, people upgrade their equipment when their initial service contract expires. In bad times, they want to defer capital spending, so they simply extend the service contract by a year. In the past year, we have seen many one-year extensions, and we believe this reflects lots of pent up demand.
So the first key trend is that IT infrastructure is aging, and eventually companies will need to buy.
The second key trend is that server virtualization is forcing people to consider radically different architectures. I’ve had many customers tell me that their initial goal was to install VMware, but they discovered that in order to get the full benefit, they first needed to re-architect their storage infrastructure. NetApp was an early leader in storage for virtualized environments, so in many cases people have switched to our storage even when they were completely satisfied with their previous vendor. It’s not that the old vendor did anything wrong; it’s just that their products didn’t work so well with server virtualization. Virtual servers can be created quickly, so you need storage technologies like thin provisioning and cloning that let you provision storage equally fast. Virtual servers proliferate, so storage efficiency techniques like data deduplication become all the more important. The whole point is to save money, so good automation is critical. NetApp’s strengths perfectly match the requirements of server virtualization.
It’s hard to sure what’s happening. It could simply be that IT spend is picking up, and a rising tide floats all boats. If this is true, then you should expect similarly strong quarters from our competitors. On the other hand, it could be that NetApp’s strength in virtualized environments is allowing us to get much more than our fair share. It’ll take a few more quarters of results to be sure, but this is what we think is going on.
The Petabyte Era is Dead
Sometimes numbers have a symbolic value beyond any real meaning they contain. Like when your odometer rolls over to a hundred thousand miles. Of course, it’s all arbitrary, since a mile is just a certain number of Roman soldier footsteps, but even so, in a funny way, it feels like a major achievement.
At NetApp, we just had one of those magic moments: in the past 52 weeks we have shipped an exabyte of storage. That’s a thousand petabytes. Or 1,000,000,000,000,000,000 bytes. Look at all of those zeros! It has no real meaning, but I still love it.
The petabyte era is dead. Long live the exabyte era!
"I've Looked At Clouds From Both Sides Now" (A Non-Technical Definition of Cloud Computing)
I got a reader comment so perfect that it calls out to be shared:
I'm not the most technical tool in the shed, but I read a lot, and I have been reading a lot about cloud computing – and the lyrics of a song keep playing in my head:
"I've looked at clouds from both sides now
From up and down, and still somehow
It's cloud illusions I recall
I really don't know clouds at all"
Cloud Computing – how does it affect me, the person on the receiving end (security issues, reliability)? The definition of Cloud – Webster's dictionary:
· Something that darkens or fills with gloom.
· A dark region or blemish – something that obscures.
Again, average person with concerns.
Posted by: Evelyn Lindquist | October 27, 2009 at 10:41 AM
I’ve noticed that technical people sometimes love definitions that are so intricately detailed that non-technical people – including most business people – can’t understand them. I think Evelyn’s comment is a sign that cloud computing is suffering from this disease.
There are good reasons for people providing cloud services to dig into the technical details of what they are doing and how they are doing it, but I think that we need much simpler definitions for people using cloud services.
I just can't let Evelyn's definitions stand (gloom, dark region, blemish), so here is my attempt at a simple, non-technical definition of cloud computing:
If I am a business person and I have a business problem that can be solved with IT, there are two ways to go about it. The traditional way is to chose an application, find some hardware to run it on, find some storage for it, find space and power in my data center, find people to operate it, and so on. The cloud computing way is to find a service on the internet instead of an application I run myself, and to let someone else handle all of those other steps. I don’t own a data center, buy any equipment, or operate anything. All my capital expenses are converted to a service fee, or – more common in clouds for consumers – the service is free because I have to look at advertisements.
I’m not saying that technical people should ignore the important differences between various cloud approaches (Software-as-a-Service, Infrastructure-as-a-Service, Platform-as-a-Service, Storage-as-a-Service, Internal/External/Private/Public). But I do believe that we’ve got to figure out how to hide as many of these intricacies as possible from the non-technical people who just don’t care.
Cloud/Grid/Utility: Definitions Drift Because IT Is In Denial About Outsourcing
Why did the term cloud computing so quickly lose its original meaning? At first, cloud computing was about how to not build a data center, but it quickly morphed into an architectural description of how you should build a data center. The first definition is about accessing IT services over the Internet from computing resources that somebody else owns and operates. The second definition is about building your own hyper-efficient data center based on virtualization, shared resources, and dynamic provisioning. These definitions are so different – contradictory even – that it takes modifiers like external, public, internal, and private to tell them apart. To many, it seems that we have sunk into a morass of confusion.
My point here, though, is not to debate definitions or technology. (I believe that both internal and external clouds will be wildly successful for many years to come.)
My question today is why the definition got muddled so quickly, especially since this isn’t the first time it has happened. The original idea of utility computing was that computing should be like electricity. In the early days, companies had their own generators, but over time, centralized power companies replaced them. The theory was that IT should evolve in the same way. This same metaphor also inspired grid computing, but as with cloud computing, the definitions quickly shifted to data center architectures. I remember many data center tours where the proud owner of rack after rack of Linux nodes would say, “Check out my compute grid.”
What is going on here? I believe that IT is in denial. CEOs and CIOs are like ships passing in the night.
CEOs ask, “Why can’t we just convert to cloud computing?” What they mean is this: “I’m tired of expensive data centers, high capital costs, and hard to manage infrastructure. Why can’t someone else do all of that for us and we just buy it as a service over the Internet? You know, like Yahoo! email or Salesforce.com?” In other words, CEOs would like to outsource big chunks of IT, just like they have already outsourced big chunks of manufacturing.
And then the CIO comes back and says, “I figured it out. We can convert to cloud computing, but the good news is that we still get to build the data center and buy the IT equipment ourselves.” Ships in the night.
IT departments are so averse to the idea of having their jobs outsourced – who wouldn’t be! – that whenever someone tries to define a term to mean exactly that, they redefine it to mean a new thing that they get to build and run themselves. Perhaps soon there will be a creative new definition of external cloud that somehow means you build it yourself.
As I said above, I believe that both internal and external clouds will be wildly successful. For at least the next five years, more likely ten, most CIOs will run a hybrid model consisting of three main parts: (1) Traditional silos, where an application, a server, and storage are purchased and installed together; (2) Internal clouds, which will initially run less critical apps and grow over time; and (3) External clouds, which will also start low and move up.
Why Is NetApp's New Data Center So Efficient? Big Fans and Hot Air
The big problem of data centers is getting the heat out. In a typical data center, for every kilowatt you put in to run your equipment, you have to spend another kilowatt on air-conditioning to pull the heat out. In other words, it actually takes 2 kW of energy to run 1 kW of equipment. This is a Power Utilization Effectiveness (PUE) of 2.0.
NetApp's new data center has a PUE of 1.2, which cuts the total power requirement almost in half. At current electricity prices, we'll save over $7 million a year. We use many techniques to achieve this, but I have two favorites: big fans and hot air.
First for the big fans: our data center moves enough air to fill the Goodyear Blimp in three seconds. To get a mental picture of this, imagine an endless series of blimps flying out of the building – every three seconds another blimp. The harder you blow, the faster you cool.
The hot air technique is more subtle. Most data centers cool air to 55 or 60 degrees, but our team figured out how to get effective cooling with 74-degree air. Less cooling equals less energy. The trick is to manage airflow carefully. Rack mounted equipment sucks air in the front and blows it out the back maybe twenty degrees hotter. Most data centers have rows and rows of racks in a big open room, so the air gets all mixed up. It works better to deliver cooler air directly to the front of the rack and collect the hot air from the back. In our previous data center, we experimented with plastic shower curtains to separate the “cool aisles” at the front of the racks from the “hot aisles” at the back, and it worked wonderfully. In the new data center, we went a step further and used drywall to build airtight cool aisles, which we pressurize to speed airflow through the equipment and ensure that the hot air can’t get back around to the front. The hot aisle can get up to 95 degrees, which is uncomfortable, but that’s okay – it’s optimized for equipment and energy, not for people.
In RTP, the outdoor temperature is 74 degrees or less 67% of the time, which means we can usually use outside air with no cooling at all. Our data center has more cooling capacity than the Empire State Building, but our goal is to leave it off.
It didn’t take exotic technology to achieve this result. The design and airflow are unusual, but we used ordinary air conditioning units, heat exchangers, and so on. In fact, we reduced the cooling capacity by about 20% so this technique is not only cheaper to run but also cheaper to build. If you want to learn more, we welcome visitors.
What Moves Markets: The Seven Hundred Million Dollar Man
Last week we had our Analyst Day in New York City, which means that we met with a theater full of Wall Street financial analysts and explained our company to them. You can learn a lot about how Wall Street thinks by watching how our stock price changed during the different presentations. (You can watch it yourself here.)
The first four speakers described our strategy, why we have been winning recently, and why we believe we are very well positioned to take advantage of trends in the market like cloud computing and industry consolidation – Tom (our new CEO) on the big picture, me on our vision of data centers and clouds, Manish on our product strategy, and Rob on our sales strategy.
Finally, Steve Gomo, our Chief Financial Officer, gave projected financial results for the next two quarters. In particular, he showed a slide saying that by Q3, we expect to be back to our normal operating profit of 16%.
We tracked the stock ticker throughout the presentations, and here’s how Wall Street responded. After the four of us did our absolute best – two hours of details – to explain what we are doing and why we will win, the stock had moved a total of three cents. In the first ten minutes of Steve’s talk, the stock went up a dollar. It’s clear what matters to Wall Street. Never mind technology, long term strategy, or market position, what moves markets is short term earnings. (Not new news, of course, but this is a particularly graphic illustration.)
By the next day, the stock was up about two dollars. Perhaps the extra dollar came from the other four presentations, and it just took a while for the analysts to digest the details, but – realistically – that was probably Steve’s doing as well. Since we have 350 million shares, Steve’s short talk – maybe just that one slide – drove NetApp’s value up by seven hundred million dollars. If you look just at the first ten minutes of his talk, when Steve got the first dollar, he was increasing our market cap at the rate two billion dollars an hour. What power: mover of markets and creator of value. Steve Gomo, the seven hundred million dollar man!
And yet, when he got home from the meeting, late that night after a cross country flight, the first words Steve heard were: “Honey, I need you to take a look at this sink. The spray hose is leaking like crazy.” From star to plumber in six seconds.
Server Virtualization: The Fastest “Major Infrastructure Transition” Ever
Server virtualization represents a major infrastructure transition on par with past transitions, like the ones from mainframe to mini (e.g. VAX), mini to UNIX, and most recently UNIX to Wintel. (I don't mean the rise of Wintel on the desktop, although that's also an interesting trend, but the emergence of Wintel running critical business apps in high-end corporate data centers.)
At VMworld last month, a customer mentioned that he felt this transition was harder and slower than the earlier ones. I know the customer is always right, but I disagreed completely and I told him so. It seems to me that VMware is moving much more quickly into high-end mission critical environments than either UNIX or Windows did.
By co-incidence, I had been talking with Steve Herrod, VMware's CTO, about this subject just the day before because the two of us gave a talk together at VMworld. He argued that what has allowed this rapid transition is that you don't have to re-write your applications to take advantage of the new model. Insightful. In some ways, the “virtualized data center” model is very different, but in other ways – in terms of the environment visible to the application itself – it is almost identical. Even when application vendors claim that their product won't run under VMware, it often works just fine. I can't think of any example where a UNIX application just accidentally happened to run under Windows even though the vendor said it wouldn't. There is certainly no U2W comparable to the P2V (physical to virtual) tools now available.
In fact, Steve's observation almost made me question whether server virtualization constitutes a “real” transition equivalent to the others. On reflection, I conclude that it does. Like the others, this transition is driving major changes in how people build and manage data centers, and it enables radically different business models at much lower price points. Also, it really isn't fair to think of server virtualization as being the whole trend. I mean, it is certainly at the center, but when you consider “cloud computing” or the “dynamic data center” – whatever you want to call this new shared infrastructure architecture that's emerging – there is much more going on than just Xen or HyperV or VMware. (We certainly think that storage has a critical role to play. See here and here.)
I think the primary motivation for people to change so quickly is the dismal economy, but Steve is absolutely right that the ability to easily migrate existing apps is what enables this to be the fastest major infrastructure transition ever.
Oracle is the “Crazy Ivan” of the IT Industry
If IBM had purchased Sun, things would have been easier to predict. I imagine the conversation in IBM’s boardroom going something like this: “They have a RISC chip; we have a RISC chip. Let’s phase theirs out. They have a UNIX; we have a UNIX. Let’s phase theirs out. They have Java, we have – uh – let’s keep that one!” And so on.
With Oracle, the outcome is not at all clear. Oracle is so different from Sun that they could keep pretty much everything. What parts of Sun are important depend on what kind of company Oracle is trying to become. Do they want to become a full-line systems vendor competing head-to-head with IBM and HP? Do they just want Java and MySQL, which are so closely related to their current product lines and strategies? Perhaps something more secret and clever than anyone else has thought of yet?
There’s just no telling, because Oracle is – and I mean this in the most positive and respectful way – the Crazy Ivan of the IT industry.
Authors@Google Talk for How To Castrate a Bull
I did one of those “Authors at Google” talks for my book. Here’s the talk link.
The audience was mostly engineers with a handful of managers. If that’s you, know that I prepared my presentation with you in mind.
AND – Why Customers Work With NetApp
I've asked many customers why they chose to work with NetApp. We've also done some formal market research on the subject. The answers mostly fall into three categories.
In some cases, customers are happy with what their IT infrastructure accomplishes for them, but they want to improve efficiency and drive out costs. NetApp has a variety of technologies to help customers store more data on fewer spindles, and we also use automation to allow more efficient storage management.
In other cases, customers want a more flexible IT infrastructure that lets them respond quickly to changing business requirements. Here the focus is on capabilities like fast and easy provisioning to get new application environments installed quickly, or cloning to get new test and dev environments running quickly. Automation helps here as well, by providing application administrators self-service access to storage system capabilities.
Finally, customers sometimes want an experienced partner to help guide them. This is especially important for customers exploring newer, more innovative architectures based on server virtualization or cloud computing. One customer told me: “We expect you to help us predict the future.”
The thing is, few customers are satisfied with just one of these. Business flexibility may be the initial impetus to consider a new architecture, but they also want to run their IT infrastructure more efficiently. And they also want experienced partners who can help them set their course. Customers may prioritize one area over the others, but they'd rather not compromise on any of them.
This is the inspiration behind our new ad campaign. For most customers, it's not about just one thing; it's about getting them all at once: AND.
Beating my head against named on Fedora!
I added 3 new subdomains to my home network for testing. I added the records to my chroot'ed named at /var/named/chroot/var/named/named.conf. I just did reverse pointers and I couldn't get it to work:
[root@adept var]# host 192.168.4.120 Host 120.4.168.192.in-addr.arpa. not found: 3(NXDOMAIN)I did this with a simple Perl script, so I debugged the heck out of it and checked for tabs galore. I finally added forward lookups, which worked:
[root@adept var]# host blast-4-120 blast-4-120.internal.excfb.com has address 192.168.4.120Heck, I've been burnt by a bad link in /etc before, so I checked it:
[root@adept var]# ls -al /etc/named.conf lrwxrwxrwx 1 root named 21 2008-02-25 16:15 /etc/named.conf -> /var/named/named.conf [root@adept var]# ls -la /var/named/named.conf lrwxrwxrwx 1 root named 38 2008-02-25 16:24 /var/named/named.conf -> /var/named/chroot/var/named/named.confI even diff'ed them to be really, really sure. I ran named manually with '-g', fixed the warnings I got and then found out it didn't handle the chroot nicely. I looked at the init file and gave up on understanding it.
I couldn't find a log file for it, so I sent a SIGHUP to look for a database dump. I added logging to the config file and never saw any output. I never found that database dump.
But I did find an option that said where it should be:
dump-file "/var/named/data/cache_dump.db";I then asked myself, is there another copy of the config file?
[root@adept var]# ps -ef | grep named named 4047 1 0 00:30 ? 00:00:00 /usr/sbin/named -u named -t /var/named/chroot root 4207 2886 0 01:00 pts/3 00:00:00 grep named [root@adept var]# cd /var [root@adept var]# find . -name named.conf ./named/named.conf ./named/chroot/etc/named.conf ./named/chroot/var/named/named.conf [root@adept var]# ls -la ./named/named.conf lrwxrwxrwx 1 root named 38 2008-02-25 16:24 ./named/named.conf -> /var/named/chroot/var/named/named.conf [root@adept var]# ls -la ./named/chroot/etc/named.conf -rw-r--r-- 1 root named 2741 2008-02-25 20:49 ./named/chroot/etc/named.confWhy yes, yes there is and it doesn't have my new zones!
[root@adept etc]# pwd /var/named/chroot/etc [root@adept etc]# mv named.conf named.conf.fracked [root@adept etc]# ln -s ../var/named/named.conf . [root@adept etc]# ls -la ../var/named/named.conf -rw-r----- 1 root named 4920 2009-08-27 00:13 ../var/named/named.conf [root@adept etc]# service named restart Stopping named: [ OK ] Starting named: [ OK ] [root@adept etc]# [root@adept etc]# host 192.168.4.120 120.4.168.192.in-addr.arpa domain name pointer blast-4-120.internal.excfb.com.Now what was I doing before I fell down this rat hole?
Originally posted on Kool Aid Served DailyCopyright (C) 2009, Kool Aid Served Daily
Three “Cloud Strategies” Every CIO Should Consider
NetApp launched a collection of cloud initiatives today, but I want to step back and ask what cloud means to CIOs. Every CIO I meet seems to want a “cloud strategy,” but the words mean different things to different CIOs. I have discovered that cloud strategies fall into three broad categories:
- Cloud Provider: Sell IT services over the Internet. Companies often consider this strategy when they have especially strong IT capabilities. Telephone companies, for instance, know how to run large, reliable data centers, so providing IT services to others is a natural fit. Data centers that provide cloud services typically use virtualization to host many customers in a single shared infrastructure.
- Cloud Customer: Buy IT services over the Internet. These CIOs don’t necessarily want to eliminate existing data centers, but they wonder whether—for at least some of their IT services—it would be possible to let someone else build the data center, buy the hardware, and run the applications.
- Internal Cloud: Build shared, virtualized infrastructure for internal users. These CIOs want to continue operating their own data centers, but they would like to use the same virtualization techniques as cloud providers to make IT more efficient.
In some ways cloud providers have it easy, because most are designing a new data center architecture from scratch. In fact, much of today’s launch focuses on what we’ve learned from working with large cloud providers like Oracle (for its Oracle On Demand business), Yahoo! (for email), and T-Systems (for its hosting business).
Most CIOs don’t have the luxury of redefining their IT architecture from scratch—they must make incremental changes to an existing strategy. A hybrid model often makes sense. They build an internal cloud and begin migrating less mission critical applications onto it. Or they identify a set of applications that can move to the external cloud. These strategies are not mutually exclusive. It’s perfectly reasonable to build an internal cloud to host the random “garbage applications” that most business seem to accumulate while at the same time outsourcing Oracle infrastructure to Oracle’s On Demand cloud.
As I described in this blog entry, NetApp’s goal is “to be the storage technology partner of choice for companies building cloud-compute environments,” and our launch today describes how we do that. There are many technology components, like Data ONTAP 8 and NetApp Data Motion, but to me what’s most important is that we have defined an effective cloud architecture (the NetApp Dynamic Data Center) along with workshops and services that help customers transform their data centers.
We have learned an amazing amount by partnering with many of the largest cloud providers in the world, and our goal now is to help others take advantage of this experience.
Tom Georgens, NetApp's New CEO
Today Dan Warmenhoven, our previous CEO, announced that Tom Georgens is our new CEO. Dan will continue as Chairman of the Board, and he will also have a new role, reporting to Tom, focusing on relationships with major partners. His title is Executive Chairman since he's an executive of the company as well as a board member.
Dan has had an astounding run. When he joined NetApp in 1994, we had forty-five employees and less than $10 million a year in revenue. Fifteen years later, we have eight-thousand employees and $3.4 billion in revenue. For all that we’ve achieved, though, Dan told me that the accomplishment that meant the most to him is being selected as the #1 Best Company To Work For by Fortune Magazine. He has always believed that a healthy culture is just as important as healthy financials.
If Dan is so great, why do we need a new CEO? Dan has had a long-term personal goal of retiring by age sixty – about a year from now. In fact, his original goal was to retire by fifty, but running NetApp was so much fun that he pushed his goal back by ten years. The general timing came from Dan's retirement goal, but this particular quarter felt right because the economy seems to be leveling out. Things certainly aren't back to normal, but the crazy freefall appears to be over and a collapse of the banking system is no longer imminent. It’s best not to change CEOs when a crisis is in full swing.
It’s hard to imagine – after fifteen years of working for the same person – that as of today, I have a new boss. Fortunately, I’ve had plenty of time to get used to the idea. The board didn’t take their final vote until just this Monday, but we’ve been planning this transition for quite a while. When we hired Tom four years ago to run Product Operations – engineering, manufacturing, and product management – we knew that he was a potential successor to Dan. There was no guarantee, but it was something we discussed in our interviews with Tom, and part of what attracted him to join us was the opportunity to become CEO. I’ve gotten to know Tom pretty well over the years. I like how he thinks, but perhaps more important, I like how well he fits our culture. He is a great choice for this role.
To people who know how to read tea leaves, it became clear that Tom was the heir apparent when we promoted him to President and Chief Operating Officer (COO) and Dan became Chairman of the Board as well as CEO. Not only did most of the company report to Tom at that point, but Dan also asked him to run our planning process to develop annual goals and budgets. That combination of moves is a pretty strong signal of what's coming next.
For NetApp this is obviously a major transition, but in fact, the entire IT industry is in a period of transition. Trends like industry consolidation, server virtualization, and cloud computing are shaking up the IT landscape, and everyone needs to respond. With so much change all around, it feels somehow appropriate to have a new CEO. I've loved working with Dan – I have so much respect for him and I've learned so much from him – but it's also exciting to have a new leader. Here's to a new era at NetApp!
Removes are failing, time to debug
Removes are failing and I don't think this is the classic issue we have had with this operation. I know that at the last BAT, the removes were succeeding enough to cause a panic. And we don't seem to be getting to that code even.
Snoop shows the client sending the remove, but the MDS is not sending anything to the DS. Barring the MDS sending to the wrong DS, I've put together a simple DTrace script to see what is going on:
[root@pnfs-17-24 ~]> more remove.d #!/usr/sbin/dtrace -s ::do_ctl_mds_remove:return { printf("rc = %d", args[1]); } ::ctl_mds_clnt_remove_file:return { printf("rc = %d", args[1]); } :::nfss-e-vop_fid_pseudo_failed { } :::nfss-i-layout_is_null_cannot_remove { } ::mds_op_remove:entry { self->resop = (nfs_resop4 *)arg1; self->resp = self->resop->nfs_resop4_u.opremove; } ::mds_op_remove:return { printf("rc = %d", self->resp.status); }I don't like the hoops I have to go through to get the status on mds_op_remove, but I have to do that because that function is of type void. I'm going to change them to return a status. It doesn't cost much and makes simple DTrace scripting easier.
Anyway, here is our result:
[root@pnfs-17-24 ~]> ./remove.d dtrace: script './remove.d' matched 6 probes CPU ID FUNCTION:NAME 0 30503 do_ctl_mds_remove:nfss-e-vop_fid_pseudo_failed 0 62050 do_ctl_mds_remove:return rc = 28 0 62052 mds_op_remove:return rc = 0 ^CWhat is that 28?
[thud@ultralord nfssrv]> grep 28 /usr/include/sys/errno.h #define ENOSPC 28 /* No space left on device */Now I want to know what vop_fid_pseudo() is returning, but only when called in this context. I can use the thread properties like this:
::vop_fid_pseudo:entry /self->live == 1/ { self->vp = (vnode_t *)arg0; self->fidp = (fid_t *)arg1; } ::vop_fid_pseudo:return /self->live == 1/ { printf("fid len = %d\trc = %d", self->fidp->un._fid.len, args[1]); }Only if self->live is set will I see output:
[root@pnfs-17-24 ~]> ./remove.d dtrace: script './remove.d' matched 8 probes CPU ID FUNCTION:NAME 0 62468 vop_fid_pseudo:return fid len = 10 rc = 28 0 30503 do_ctl_mds_remove:nfss-e-vop_fid_pseudo_failed 0 62050 do_ctl_mds_remove:return rc = 28 0 62052 mds_op_remove:return rc = 0 ^CAnd now I need to know where that error is set:
49 vop_fid_pseudo(vnode_t *vp, fid_t *fidp) 50 { 51 struct vattr va; 52 int error; 53 54 error = VOP_FID(vp, fidp, NULL); 55 56 /* 57 * XXX nfs4_fid() does nothing and returns EREMOTE. 58 * XXX nfs3_fid()/nfs_fid() returns nfs filehandle as its fid 59 * which has a bigger length than local fid. 60 * NFS_FH4MAXDATA is the size of 61 * fhandle4_t.fh_xdata[NFS_FH4MAXDATA]. 62 * 63 * Note: nfs[2,3,4]_fid() only gets called for diskless clients. 64 */ 65 if (error == EREMOTE || 66 (error == 0 && fidp->fid_len > NFS_FH4MAXDATA)) { 67 68 va.va_mask = AT_NODEID; 69 error = VOP_GETATTR(vp, &va, 0, CRED(), NULL); 70 if (error) 71 return (error); 72 73 fidp->fid_len = sizeof (va.va_nodeid); 74 bcopy(&va.va_nodeid, fidp->fid_data, fidp->fid_len); 75 return (0); 76 } 77 78 return (error); 79 }And that means I need to know what VOP_GETATTR() is returning. I'll pull the same trick as before, but I'll need to be looking at, well, I have to figure that out now, don't I?
The trick is to let the vnode tell me:
int fop_getattr( vnode_t *vp, ... err = (*(vp)->v_op->vop_getattr)(vp, vap, flags, cr, ct);So I need to find the pointer to the function:
printf("vop_getattr = %p", self->vp->v_op->vop_getattr);And then I need to find out what that is:
0 62467 vop_fid_pseudo:entry vop_getattr = fffffffff883c180 ... [root@pnfs-17-24 ~]> mdb -k Loading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs mpt sd sockfs ip hook neti sctp arp usba stmf fctl nca lofs idm cpc random zfs nfs fcip logindmux ptm sppp ] > fffffffff883c180::dis zfs_getattr: pushq %rbpOkay, we at least know which filesystem is complaining. I probably could have done this differently, but this way works. The only error in that function comes from zfs_zaccess():
0 53373 zfs_zaccess:return rc = 0 0 53125 zfs_getattr:return rc = 0 0 53125 zfs_getattr:return rc = 0Which means I was reading the code incorrectly! Note that these are being called by the code, my check forces that to be true, but the error must be elsewhere. It must be the VOP_FID() which is causing the issue!
4323 static int 4324 zfs_fid(vnode_t *vp, fid_t *fidp, caller_context_t *ct) 4325 { 4326 znode_t *zp = VTOZ(vp); 4327 zfsvfs_t *zfsvfs = zp->z_zfsvfs; 4328 uint32_t gen; 4329 uint64_t object = zp->z_id; 4330 zfid_short_t *zfid; 4331 int size, i; 4332 4333 ZFS_ENTER(zfsvfs); 4334 ZFS_VERIFY_ZP(zp); 4335 gen = (uint32_t)zp->z_gen; 4336 4337 size = (zfsvfs->z_parent != zfsvfs) ? LONG_FID_LEN : SHORT_FID_LEN; 4338 if (fidp->fid_len fid_len = size; 4340 ZFS_EXIT(zfsvfs); 4341 return (ENOSPC); 4342 }Some quick DTrace will tell me I'm correct:
::zfs_fid:entry /self->live == 1/ { zp = (znode_t *)(self->vp)->v_data; zfsvfs = zp->z_zfsvfs; lfid = sizeof (zfid_long_t) - sizeof (uint16_t); sfid = sizeof (zfid_short_t) - sizeof (uint16_t); size = (zfsvfs->z_parent != zfsvfs) ? lfid : sfid; printf("fidlen %s size! (%d, %d, %d, %d)", self->fidp->un._fid.len fidp->un._fid.len, size, lfid, sfid); } ... 0 53162 zfs_fid:entry fidlenSo is this new? Or something I introduced? I don't see how the code ever worked here. We pass a fid_t off the stack, i.e., uninitialized, into vop_fid_pseudo() and when we call zfs_fid(), that check should trigger every time. The comments on lines 57-61 lead me to think that vop_fid_pseudo() is not intended for types other than NFS. In any event, the other callers of it do:
makefh4(nfs_fh4 *fh, vnode_t *vp, struct exportinfo *exi) ... fid_t fid; ... bzero(&fid, sizeof (fid)); fid.fid_len = MAXFIDSZ;And the file should have been removed!
0 67453 zfs_fid:return rc = 0 0 81693 vop_fid_pseudo:entry vop_getattr = fffffffff8088180 0 67452 zfs_fid:entry fidlen notFrom the snoop we can both see the MDS communicating with the DS and that I need to go teach snoop about a remove!
CTL-MDS: ----- Sun CTL-MDS ----- CTL-MDS: CTL-MDS: Proc = 10 (Remove object(s) or entire fsid at the DS) CTL-MDS:While I fixed a valid bug (and we are past the code that was panicking at the BAT, I doubt I fixed the problem we had been seeing:
Before a huge copy:
[root@pnfs-17-22 ~]> zfs list NAME USED AVAIL REFER MOUNTPOINT pnfs1 9.35G 6.28G 23K /pnfs1After a huge copy:
[root@pnfs-17-22 ~]> zfs list NAME USED AVAIL REFER MOUNTPOINT pnfs1 12.5G 3.09G 23K /pnfs1And after the delete:
[root@pnfs-17-22 ~]> zfs list NAME USED AVAIL REFER MOUNTPOINT pnfs1 12.5G 3.09G 23K /pnfs1Note, the DS did report that it tried to delete the file:
[root@pnfs-17-22 ~]> ./remove.d dtrace: script './remove.d' matched 2 probes CPU ID FUNCTION:NAME 0 62434 ctl_mds_srv_remove:return status = 0 1 62434 ctl_mds_srv_remove:return status = 0 1 62434 ctl_mds_srv_remove:return status = 0 0 62434 ctl_mds_srv_remove:return status = 0 1 62434 ctl_mds_srv_remove:return status = 0 1 62434 ctl_mds_srv_remove:return status = 0 0 62434 ctl_mds_srv_remove:return status = 0Amazingly enough, I had forgotten I was running that DTrace script. So the issue is that ctl_mds_srv_remove() reports success, but the file contents are still there.
I'm starting to wonder if I haven't hit my head on a known ZFS bug that is fixed in 118. The pNFS code is based on 117. I'll investigate.
Originally posted on Kool Aid Served DailyCopyright (C) 2009, Kool Aid Served Daily
Regression testing saves my bacon
So, my unit testing is all done and it is time to integrate. Bzzt! Wrong answer - we have to do regression testing. In our case, it is called pNFS/miniPIT and was mainly crafted by Helen Chao. And it was bombing on my new code.
First of all, the core was confusing:
[root@pnfs-minipit2-4 ~]> panic[cpu0]/thread=30008a61560: 000000047ad BAD TRAP: type=31 rp=2a100f596d0 addr=14 mmu_fsr=0 occurred in module "nfssrv" due to a NULL pointer dereference nfsd: trap type = 0x31 addr=0x14 pid=100528, pc=0x7afb3d64, sp=0x2a100f58f71, tstate=0x4414001604, context=0x29b g1-g7: 7be42af4, 2, 2000, 0, 0, 29b, 30008a61560The NULL pointer wasn't anything to do with the function being called. Secondly, I focused on trying to reproduce it inside the test harness - it wasn't until I made a simple test case that I progressed.
Third, I focused on the server being 32 bit instead of the real difference - it was not setup as a MDS - therefore it had no DSes reporting to it. I'll show why that is important:
4463 error = nfs41_spe_allocate(&spe_va, claddr, 4464 vp->v_path, &lc, TRUE); 4465 if (error) { 4466 /* 4467 * XXX: Until we get the SMF code 4468 * in place, we handle all errors by 4469 * using the default layout of the 4470 * old prototype code 4471 * 4472 * At that point, we should return the 4473 * given error. 4474 * 4475 * XXX: Any way *plo is NULL here? 4476 */ 4477 *plo = mds_gen_default_layout(cs->instp); 4478 4479 /* 4480 * Record the layout, don't get bent out of shape 4481 * if it fails, we'll try again at checkstate time. 4482 */ 4483 (void) mds_put_layout(*plo, vp); 4484 4485 return (NFS4_OK); 4486 } 4487 4488 /* 4489 * XXX: Any way *plo is NULL here? 4490 */ 4491 *plo = mds_add_layout(&lc);The fourth, and major problem, is that I had valid comments and I ignored them. Yes, it was certainly valid for the *plo to be NULL after 4477 but probably not 4491. But they are both easy enough to code for!
The issue is that a NFSv4.1 server, i.e., no pNFS, also goes through this code path. We need to handle not having a layout:
4474 * At that point, we should return the 4475 * given error. 4476 */ 4477 *plo = mds_gen_default_layout(cs->instp); 4478 if (*plo == NULL) { 4479 status = NFS4ERR_LAYOUTUNAVAILABLE; 4480 } else { 4481 /* 4482 * Record the layout, don't get 4483 * bent out of shape if it fails, 4484 * we'll try again at checkstate time. 4485 */ 4486 (void) mds_put_layout(*plo, vp); 4487 } 4488 4489 return (status);Which appears okay, except with just this change, files will not be created. We want to return NFS4ERR_LAYOUTUNAVAILABLE, mainly for DTrace probing, but in the caller, we want to do:
4892 } else { 4893 status = mds_createfile_get_layout(req, vp, cs, &ct, plo); 4894 4895 /* 4896 * Allow mds_createfile_get_layout() to be verbose 4897 * in what it presents as a status, but be aware 4898 * that it is permissible to not generate a 4899 * layout. 4900 */ 4901 if (status == NFS4ERR_LAYOUTUNAVAILABLE) { 4902 status = NFS4_OK; 4903 } 4904 } Originally posted on Kool Aid Served DailyCopyright (C) 2009, Kool Aid Served Daily
Tracking down a nasty delegation return bug
I've been fixing up a refcnt bug I found in recall_read_delegations(). One of the things I found out while in Austin was how to unit test those changes. We need a NFSv4.0 and a NFSv4.1 client both reading a file from a NFSv4.1 server. Note that we can't have a pNFS server, due to we do not yet support a mount option to force a 4.1 over a 4.0 mount. And we may never.
Anyway, you have two clients with a read delegation and force a recall by touching the files from another client.
I know it is a good test case because the server consistently crashes:
panic[cpu0]/thread=ffffff02d457db40: kernel heap corruption detected ffffff001054de90 genunix:kmem_error+4a9 () ffffff001054deb0 genunix:kmem_free+d0 () ffffff001054df30 nfssrv:recall_read_delegations+8e8 () ffffff001054df90 nfssrv:deleg_rd_setattr+45 () ffffff001054e000 genunix:vnext_setattr+84 () ffffff001054e060 nfssrv:deleg_rd_setattr+66 () ffffff001054e0e0 genunix:vhead_setattr+9f () ffffff001054e140 genunix:fop_setattr+ad ()Firefox rules! My UPS just failed and I thought I lost all of this. Yeah Firefox for saving the day!
I added DTrace probes, printf() statements when the probes failed to fire, and finally delays in the code. I couldn't see anything. I finally decided to remove all of that code and worked on cleaning the original code up. There was a rfs4_file_t pointer (fp), array of pointers (fpa), and last active pointer (fap). I was getting confused.
I fixed all that and while the panic was still occurring, the code was easier to read. I finally started adding some ASSERTs and nailed down the problem!
If we look at the original code:
89 /* 90 * Is there more than one file structure for this vp? 91 * Get the vsd for each instance of the server, if it exists. 92 */ 93 fpa = kmem_zalloc((sizeof (rfs4_file_t *) * nsi_count), KM_SLEEP); 94 DTRACE_NFSV4_1(rrd__i__fpa_alloc, int, nsi_count); 95 96 fpa[0] = fp; 97 cnt = 1; 98 mutex_enter(&vp->v_lock); 99 for (instp = list_head(&nsi_head); instp != NULL; 100 instp = list_next(&nsi_head, &instp->nsi_list)) { 101 fpa[cnt] = (rfs4_file_t *)vsd_get(vp, instp->vkey); 102 if (fpa[cnt] && (fpa[cnt] != fp)) 103 cnt++; 104 } 105 mutex_exit(&vp->v_lock);I said that before 101, it should be the case that cnt is less than nsi_count. I added an ASSERT and it triggered right away. I looked back at the code and while I still thought the ASSERT was valid, I saw why it could trigger in a good code path. Consider a case where the number of items in the list is greater than nsi_count. Then consider that we immediately fill in the remaining items in fpa, leaving some items in the list. We then read an item, but we don't mean to store it.
That is where the bug actually comes in I believe in the author's mind. Either there never is more than nsi_count entries in the list or not considering there can be more valid ones. Another point here is that there is no way to guarantee that the first item returned in the list is fp.
My fix is this:
103 for (instp = list_head(&nsi_head); instp != NULL; 104 instp = list_next(&nsi_head, &instp->nsi_list)) { 105 rfs4_file_t *temp; 106 107 temp = (rfs4_file_t *)vsd_get(vp, instp->vkey); 108 if (temp && (temp != fp)) { 109 ASSERT(cnt < nsi_count); 110 fpa[cnt++] = temp; 111 } 112 }It works as a great defensive fix, but I'm concerned it misses the bigger issue of we should either count the items first in the list for the allocation or we should determine why we have more entries than we expect. I'll have to confer with my colleague once he gets back from vacation...
Originally posted on Kool Aid Served DailyCopyright (C) 2009, Kool Aid Served Daily
Code walk through was a success
The code walk through in Austin was great for me - I ended up resolving all of my open issues and several bugs were found. I need to do another round of unit testing and then start running our regression test suite.
I also got some great help in putting the spe daemon into smf:
[root@pnfs-17-24 nfs]> svcadm enable svc:/network/nfs/spe [root@pnfs-17-24 nfs]> ps -ef | grep spe root 100882 100854 0 11:02:47 pts/1 0:00 grep spe root 100880 100004 0 11:02:30 ? 0:00 /usr/lib/nfs/spedNot sure I have the dependencies right, but that may be because I have such a horribly hacked together test rig. I'll have to do a fresh install to make sure things start right the first time.
Originally posted on Kool Aid Served DailyCopyright (C) 2009, Kool Aid Served Daily
Off to Austin for a code walkthrough
I'll be in Austin this week getting approval for the adoption of mds_sids to manage devices, layouts (in core and on disk), and file handles. Oh, and to add kspe!
For those of you interested, the code is over here: http://cr.opensolaris.org/~tdh/perse/. I've removed the spe debug printfs and added DTRACE probes as appropriate.
On my todo list is to perhaps remove some debug DTrace probes (don't see a real need for in shipping code, kinda just kernel printfs) and to determine whether kspe lives in the nfs or nfssrv module. I made a half-hearted attempt to stuff it back into the nfs module and got reminded why I had it in nfssrv in the first place. I.e., need to get at mds_sid info and to abstract the kspe globals such that we aren't in the way of zonification. That last is the kicker - everything in the nfs module needs to know everything about kspe and it gets messy.
Originally posted on Kool Aid Served DailyCopyright (C) 2009, Kool Aid Served Daily
