Wednesday, February 12, 2014

Automatic Classification According to eCl@ss

The product classification system eCl@ss has established itself in many industries as the preferred standard. Initiated and funded by Germany's major companies and later adapted by the federal procurement and some of the federal states, eCl@ss today plays an increasingly leading role in the implementation of electronic commerce for SME (small and medium-sized enterprises). This being said, the classification of product master data via eCl@ss is by no means trivial. eCl@ss is more than just a product group structure and the classification of items in product groups may be itself already an endeavor hardly manageable without automation, particularly with data sets that go into the hundred thousands or even millions.

Yet whoever wants to classify product master data entirely via eCl@ss has to consider the so-called feature lists on top. Also, there is no operating eCl@ss correctly and consistently without supplying all articles with a customs tariff number. We'll come to that in a later blog post.

Classifying on the "green field"

Naturally, here as elsewhere applies: Every journey begins with the first step. Every eCl@ss introduction should start with a product category assignment. On the one hand, it can be quite useful for the eCl@ss introduction if another product group system (a company-owned in-house one, for example) is already in place. On the other hand, as a point of departure this can tempt into going the wrong route, namely, into trying to map the eCl@ss category from the existing product group. At first, this approach appears obvious and sensible, as it makes use of already available information. In the long run though often entire departments become occupied with the creation of mapping tables intended to map the product groups onto each other. At the end it typically turns out that such a mapping can hardly or not be defined at all, unless the granularity of the two product group systems are incidentally identical up to the last detail.

Let's illustrate this with a specific example.
Assuming, a company’s product master data were classified according to an internal product group system. This system would, for example, separate flow pumps into axial, diagonal and radial pumps. Such product grouping may well occur in industrial practice.

eCl@ss, however, offers very different categories:


  • Submersible pump
  • Circulation accelerator pump
  • Ship lift (pressure increase)
  • Centrifugal pump with shaft seal
  • Centrifugal pump with canned motor
  • Centrifugal pump with magnetic coupling
  • Other unspecified centrifugal pump

Obviously, there is no correlation between the two systems. The product group in eCl@ss is significantly determined by the sealing system (shaft seal, magnetic coupling , ...) of the centrifugal pump, while the design (axial, diagonal or radial) is only laid out as an additional feature. In our assumed example however, the design is crucial for the definition of the product group itself. No matter how you look at it: the eCl@ss category can not be derived, at least not as long as one only considers the internally classified product group. Unfortunately, in practice this often leads to "pragmatically" attributing a shaft seal to each and every pump. Alternatively and even worse, all pumps may straight away be subsumed under "Other unspecified centrifugal pump". It should be clear, that this "solution" is far from being pragmatic, actually even more than botchy.

The answer to this problem is relatively simple though. As well as considering the internal product category, all available information about the article needs to be evaluated. If for example the article description includes the terms "magnetically coupled", then this information should obviously be employed instead of being ignored. To effectively implement this approach, a software is needed that can utilize existing master data for automatic classification using machine learning algorithms. Roughly speaking, this principle uses known examples (learning set) to calculate how the occurrence of the term "magnetically coupled" as part of the article description affects the (conditional) probability, that the article in question is indeed a "centrifugal pump with magnetic coupling". The calculated value is then used to predict the remaining, as yet unclassified articles. In fact, algorithms for automatic classification work even quite satisfyingly if there hasn't been an internal product group in use prior to the introduction of eCl@ss.

The next blog post will discuss how methods of automatic classification can optimize master data quality.

Holger Joest