In early May, the National Highway Traffic Safety Administration (NHTSA) released the March 2004 listing of vehicle recalls. Included in the report are large numbers of "potentially involved" vehicles. As in as many as 938,789 2000-03 Taurus/Sable models ("to correct a problem with a malfunctioning stop lamp switch and/or associated wiring. This malfunctioning could render the stop lamps inoperable or cause them to stay on all the time. . . .") As many as 3,662,211 GM '00 to '04 Silverados, Sierras, Avalanches, and Escalade EXTs ("the galvanized steel tailgate support cables that retain the tailgate in the full open (horizontal) position may corrode, weaken, and eventually fracture. . . . ") There are more. Plenty more. Chryslers and Bentleys. Maseratis and Winnebagos. Millions of cars. On an annual basis, billions of dollars. To say nothing of all of the people who may lose their interest in a particular vehicle manufacturer's products and spend their dollars elsewhere. Recalls are a non-trivial problem.
A company that has its roots at the University of Utah may have a means to address vehicle problems that crop up before they aggregate to the tipping point of recalls. As Craig Norris, CEO of Attensity Corp. (Mountain View, CA—but the tech people are still in Salt Lake City; www.attensity.com) puts it, "For 10 years our founders spent time slogging it out at the University of Utah trying to figure out in a unique way to get more and better information out of text so that it might be made actionable." And the texts are forms that service departments everywhere are dealing with on a daily basis.
It's linguistics meets computer science. And the result is something known as "natural language processing." Which is simply (well, simple to say, anyway) a method that uses computers to process written language—such as the "comments" field in a warranty claim—for a useful purpose. Like finding problems fast. And taking action. Realize that when you look at those numbers of vehicles involved in recalls that there are a whole lot of warranty claims that may need to be read—and which probably aren't, given the vast numbers. But the Attensity approach handles this via silicon, not eyeballs. And while this technology may sound as though it is something that you'd have to bust out the supercomputers to handle, Attensity's Relational Extraction Server can run on a PC with a Pentium 4 and Windows XP.
NOT JUST A CUSTOMER. AN INVESTOR.
How encouraging this is depends entirely on your point of view: One of the investors in Attensity is In-Q-Tel. Which is an investment firm established by the CIA. (Apparently that's not a secret.) One of the examples that's used to explain the function of the company's Relational Extraction Server is the following sentence:
John Doe bought C4 from John Smith in Cairo on October 4, 2000. Given the amount of information that outfits like the CIA must process, having a natural language processor can certainly be advantageous. Which explains the CIA's interest.
But while we're on the subject of sentences, here's one that Norris provides that comes from the auto industry:
The bolt on the underside of the transmission cracked due to heat. He explains that a company Attensity is working with had been using another language processor. That language processor came to the conclusion that the transmission cracked, not the bolt, because the two words are next to each other. Attensity's algorithms came up with the right interpretation: the bolt cracked. That's because they do something that other systems apparently don't do.
THIS WOULD HAVE MADE ENGLISH CLASS A WHOLE LOT EASIER.
Simply stated, Attensity's product diagrams sentences. Remember the whole subject, verb, object, modifiers thing? That's what they do. Norris: "We're able to diagram sentences very fast: We can do Moby Dick in less than five seconds on a simple one-processor machine."
So what difference does that make to building better cars, trucks, and the like? Plenty.
TALKING IN CODES.
The good news about language—the regular lingua franca that we use—is that it is so flexible. Which is also the bad news about language: It is open to interpretation.
So when it comes to things like warranty forms, instead of using discursive language, code is deployed. One of the first companies in vehicle manufacturing that Attensity worked with was John Deere. Recalling what they learned early on in the engagement, Bart O'Brien, Attensity vp of Business Development, says, "We were surprised at how useless the codes were for actually identifying problems."
And there are plenty of reasons why. For one thing, he says, "Codes have a fairly distant relationship to the cause of the failure unless the failure is strictly caused by a component failing." He explains, "The reporting system is component-oriented, so it almost assumes that all problems are component failures. But given the advent of things like Six Sigma, the components are probably the least likely source of failure."
For another: "Codes tend to disguise failures that have happened for other reasons—particularly assembly-related reasons that have no part number."
"You can have one problem that's coded in multiple different ways. And for logical reasons," says Norris. One reason is that there may not be a code that describes the actual problem, so they have to make do with what's available. (Or there could be a code, but it is so uncommon that the technician would have to look it up, and he's not paid to do that, so he goes with what he knows for the sake of expedience.) Another reason is that since the parts often flow through the warranty system, the problem is identified as being with the most expensive part involved since they're looking for reimbursement, so the prime part ID'd tends to be that costly one. (The stories that suppliers can tell about receiving parts back that are ostensibly "faulty" but that work just fine are undoubtedly legion.)
It's not that the technicians aren't trying to do the right thing. According to O'Brien, "We did an analysis and found the technicians were doing a good job of picking the best codes of those offered to them." But what's being offered to them may not describe the real problem. They cite a situation with a Tier One seating supplier. Heated seats weren't heating. The codes used to describe the problem indicated that there were electrical accessory failures. The real issue was that the wrong seat backs were being installed. But there was no code for that. "There are certain problems, especially assembly problems, that get spread over many different part numbers," O'Brien says. This is because there's no readily available code to describe that problem. "It forces what we call statistical dilution. People are struggling to describe something that can't be described in code." So they pick various part numbers, and the problems consequently get spread around, or diluted.
"Look at the universe of problems," O'Brien posits. Three categories: Component failure. Design problems. Assembly problems. "Design and assembly problems can mask themselves as a component failure: If I miswire the fuse box in a tractor, it's going to blow the alternator. So that part needs to be replaced." And that part is undoubtedly going to be the one identified as the one that there was a problem with. The wiring issue gets disguised. "The reporting system itself is component-oriented, so it almost assumes all problems are component problems. It's not going to be reported as an assembly problem." At least not in terms of filling in the Complaint, Cause, Correction fields on the typical form.
KEEPING UP WITH THE READING.
When the problems are complex—which many problems tend to be nowadays—and the number of vehicles and claims are huge, there are exceedingly complex issues to be resolved. And some of them can be resolved with natural language processing.
So why don't people just read what the technicians write in their comments? For one thing, there is a limited number of people who are available to do the reading and a seemingly increasing number of forms to read. What's more, when technicians from around the country write about the very same problem, they do so in a variety of styles with a variety of jargon. This is what's called "unstructured data." It may be rich in information, but from the point of view of a person, there are issues. For one thing, even if you had a person reading all of the warranty forms, she wouldn't know what she was looking for. "That's a problem," O'Brien says, "because the things you're trying to solve are happening at typically less than a 1% incidence level, and the human mind cannot really pick up things at such a low level of incidence, especially when they're coming in over time."
Which is where Attensity comes in. "Our belief," O'Brien intones, "is that only by doing a consistent mass conversion of this rich, unstructured form of data into a structured form that can be graphed and analyzed can you start picking up low-level problems that you want to solve." He's talking about picking them up when they're at a level of 0.1%.
BACK TO THE DIAGRAMMING.
Essentially, the Attensity products take the forms that come in and diagram the sentences in the texts produced by the technicians. This allows them to extract the who, what, why, when, where of the situation. This information is then "relationalized"—put into rows and columns. Because the system diagrams the sentences the right words go in the appropriate categories.
For example, back to the cracked transmission/bolt sentence. Using the Attensity linguistic extraction and monitoring tools, the note "The bolt on the underside of the transmission case cracked due to the heat" can be organized:
|Event Type||Part affected||Area on vehicle||Cause*|
Norris says that once the information is set into a structured format, executives and engineers can gain a better understanding of what's occurring and they can apply existing analytic tools to focus their people on the real cause of the problems: "There is always a tension between the number of problems that I have and know about and my resource capability to assign to try to solve them." If the existing part-centric approach holds sway, and if there is statistical dilution as various technicians indicate various parts as being the cause, then the likelihood of recalls continuing is high.
Consider the aforementioned alternator/wiring problem. People responsible for the alternator are tasked with solving the problem. They determine that there is nothing wrong with the alternator. So they star reading through the warranty claims. They read the verbatim reports and realize that it is actually a wiring harness problem. So they send it to that group. Norris suggests that this is too time-consuming. "We take the verbatim information that's used at the end of the process and move it to the front end. Let's ID where the problem is likely to be. If I know that we've got a specific problem, then we can send it to the factory, and they can simply walk out on the floor and ask what's going on with the wiring harness." Maybe the problem is with the design or the assembly—not the harness itself. And certainly not the alternator.
THE IMPORTANCE OF RELATIONALIZED INFORMATION.
Norris estimates that as much as 85% of the information that's available to corporations is not in rows and columns. It's not relationalized. It's in text. And there's an increasing amount of information out there. So what's happening? Decisions are being made about things without complete information.
Norris tells a little story about the consequences of this. Say your arm hurt. You go to a doctor. She does tests and tells you to come back in a week. You come back in a week. She says that your arm is to be amputated. ("My first thought would be that my arm feels better," Norris says.) You ask how she came to that conclusion. She answers that she used 15% of the information from the test and ignored the rest. Chances are, you'd ask her to look at the other 85%.
*For those of you who are wondering about the way that "John Doe bought C4 from John Smith in Cairo on October 4, 2000" was handled:
|Event type ||Sub-type||Potential Bomber|
|Terrorism||Bomb making||John Doe|
|Material||Place of Purchase||Date of Purchase|