Monday, October 9, 2017

Big data as a Tool for Better Master Data

High-quality master data is often decisive for the success of Analytics and Big Data projects. An analysis that starts with incomplete or incorrect Master Data will usually yield incorrect results. One can rightly claim that Master Data in some way lays the foundation for Big Data. This correlation is fairly obvious and has been described many times (see[1],[2], and[3]).

Conversely, there is a connection that at first glance does not seem to be so obvious: analytical methods and Big Data can help to improve the quality of your Master Data. We would like to give you an example of this in the following.

Relevance of Attributes for the Customer's Purchase Decision

When it comes to hundreds of thousands of item records, manual revision and enrichment can often become very time-consuming. This makes it all the more important to focus on the relevant attributes. It would be mindless to put effort into improving the data quality of attributes that afterwards turn out to be insignificant. If you succeed in identifying the relevant attributes, then you can achieve a higher quality increase with the same effort. But how do you know which are the really important attributes?

When assessing the relevance of attributes, the knowledge and experience of stakeholders should be brought in by appropriate participation. In e-commerce, however, involving stakeholders is rather difficult, as the customer is the stakeholder. And no customer would like to participate in a survey to answer whether this or that piece of information was the more important one for his purchase decision.

Find Relevance by Means of Choice-based Conjoint Analysis

Here statistics comes to your aid. As early as in the 1970s, psychologists and market researchers developed a method called Choice-based Conjoint Analysis (CBCA), which enables us to provide information about the "perceived benefit" of a single product characteristic from the customer's point of view. This method still works even if many other product features go into the purchase decision. By applying the CBCA, you can deduct the benefit of each individual product feature from the total value of the product.

The method is based on purchasing decisions of customers, who have to choose between different but comparable products (e. g. smartphones with different amounts of memory, processor speed, etc.). Such a choice situation can be achieved in an online shop by simple means and the customer's behaviour can be easily understood by analyzing the log files. Based on a statistical benefit and decision model, a formula with a number of free parameters is derived. This formula can be used to calculate the probability with which the customer will decide on one of the products offered in the choice situation. The parameters are then adjusted iteratively until the calculated value matches the customer's actual behaviour in the best possible way (maximum likelihood method). In the end, each of the inferred parameters reflects the partial benefit of a certain product characteristic. From the partial benefit, we can then deduce the relevance of the attribute in relation to other attributes.

Big Data: Implementation with Apache Spark

This procedure, especially when applied to e-commerce, quickly results in an order of magnitude of tera bytes. Classic software for Conjoint Analysis is not necessarily the best choice in this setting. At one of our customers we used a Spark/Hadoop cluster to assess the relevance of attributes. We were able to implement the maximum likelihood method relatively easily using the Apache Spark Machine Learning Library (MLlib).


With the help of the Choice-based Conjoint Analysis, the relevance of attributes can be determined on the basis of customers' buying decisions in an online shop. This helps to improve the quality of Master Data in a targeted manner.

Tuesday, October 27, 2015

GDSN  - What's Next?

Now that the GDSN community is in full swing of preparing for the update to Major Release 3, which is scheduled for May 2016, it's a good time trying to dare a glimpse into the future and wrap our minds around developments that, as we believe, are likely to become a focus of attention in master data synchronization over the next few years.

Notwithstanding the challenges and efforts ahead, Major Release 3 will be a big step forward. It provides many substantial improvements, some of which were long and eagerly yearned for, and moreover, we are convinced that it will open up the standard for broader adoption. So far, so good. But, what is next? If a standard doesn’t evolve, it’s dead. Thus, nobody should be surprised that there is room for improvement. The discussions that lead the way to Major Release 3 started many years ago, and meanwhile, new challenges came about that have not yet been addressed nor discussed, let alone solved.

These challenges arise from an overall need for greater agility and responsiveness in the supply chain which in turn demands the same from global data synchronization. Essentially, it all boils down to providing and processing
  • more volatile data,
  • more detailed and fine-grained data, particularly with more attributes,
  • more consumer-targeted data,
  • more interwoven and interdependent data,
  • more data at an earlier stage of the product lifecycle,
  • and more partially available or even premature data.

Moreover, these challenges get boosted by an increasing number of participants in the market, that all require access to product data, particularly
  • more (and smaller) retailers,
  • more verticals
  • third-party logistics,
  • e-commerce,
  • search engines,
  • mobile applications,
  • and, before anything else, the end consumer.

So why do we think that the GDSN standard in its current form, even with Major Release 3, isn’t fit for these challenges? In business and in engineering there are often good reasons to do things exactly the way they're done, and surely one might be tempted to assume that this applies equally well to the GDSN. On the other hand, it’s occasionally a good idea to have a look beyond the horizon. After all, it is a certain fact that the GDSN community is not alone in its efforts to shuffle data back and forth. And if you look around, you’ll notice that the way synchronization is done in GDSN is not without alternatives. Of course, it’s a valid question to ask: Can’t we do it any simpler than with all that rigid and complicated choreography stuff?

But before we delve deeper into the present shortcomings of the GDSN, we should try to understand how it got there, and thus take a look at how it evolved historically. The GDSN was designed to pursue a “push-centric” messaging approach, where the relevant information is actively sent in the form of business messages. This approach got its roots in the EDI standards of the nineteen-eighties, namely in ANSI X12 and EDIFACT/EANCOM. These were mostly adopted for messaging schemas dealing with purchase orders (ORDERS) and invoices (INVOIC), where clearly a push-centric approach is legally advisable: Before an invoice can become due, it somehow must have been delivered. Similarly, a purchase order can only become binding after the supplier provably received it. In fact, if you need a certain level of guaranteed delivery, usually in the form of a confirmation receipt, then it is better to push the message to your business partner rather than to rely on them for pulling it. By the way, it is no surprise that in the aftermath of the greater adoption of EDI, an Internet-based transport protocol like AS2 evolved, which provides exactly such a kind of confirmation receipt, namely the MDN, on top of a push protocol.

There are several reasons, why the push-centric approach that was initially chosen for purchase orders and invoices, sometimes still makes perfect sense for master data synchronization. For example, when you deliver an update on a product item which is highly important for a retailer to know about, then, of course, you want to be able to prove that you did. In general, however, push-centric approaches tend to be stolid. Why? Because, when a sending party starts pushing a message, it cannot know the receiver’s most urgent demand, at least not at exactly that point in time.
This is what happens to a retailer in the GDSN when they submit an item subscription to the data pool where the pub-sub match results in hundreds of thousands of item notification messages, which is regularly the case when e.g. the subscription retrieves on a target market. The result is often a flood of messages jamming the line for hours or even days and there’s almost no chance of getting through with any other subscription during that time. If you urgently need the product data for another item at that point in time, you don’t have to be a genius to start dreaming of a URL where you could just download™ the desired item data instantaneously and on demand. Technically, what this dream is about is the ability to pull the information, hence a pull approach. And now you understand that the GDSN has implemented only the first half of a push-pull strategy. As they describe it perfectly well in this Wikipedia page: “On markets the consumers usually "pull" the goods or information they demand for their needs, while the offerers or suppliers "push" them toward the consumers.”

That’s why suppliers feel quite comfortable with the push-centric approach of the GDSN, and it’s also why the retailers feel the pain. We see it as inevitable that the GDSN needs to be supplemented with a pull approach. In other words, we believe that the GDSN needs to get its missing second half of the push-pull strategy. Recent ambitions of the GS1 like the “GTIN+ on the web” standard go exactly in that direction. “GTIN+ on the web” provides schemas for structured product information, very similar to the data structures in a GDSN item notification, but adhering to the rules of linked data. The crucial thing about linked data is, that every data object or “resource” as they call it, has its unique, permanent, and addressable location, namely a URL. Now with linked data, a product item or even a set of product items has the URL we dreamed of above, where you can just download the desired data from. It’s basically the idea behind the World Wide Web that we all know, where every page has a URL, just carried forward to the realm of information exchange between software systems.

In our opinion, it’s just a matter of “when”, not “if”, that data pools will pick up on these ideas.

Thursday, June 25, 2015

Change makes for a vibrant daily life

Change Management is a paradox in itself. Because the term denotes something that is a daily routine in private and professional life: change and development.

We constantly move from one "setting" to the next, from waking up and going to the bathroom, from work and to off time spend with family and friends. We constantly react to what we encounter and thereby, to a certain degree, alter ourselves as well as our environment. Perhaps precisely because change is so self-evident, as part of a project, change management is sometimes considered a mere luxury.


Specifically IT projects are initially about tangible optimisations of processes or structures. They are designed in view of major milestones and measurable effects. Anything in between, especially the "taking along" of colleagues, is considered self-evident by the project management - a given task to be done on the side.

This mindset ignores the fact that not all stakeholders in the project automatically follow the same development. Each staff, unit, department or company could pursue a different goal. Also, behind the scenes, hidden agendas and entire dramas unfold: desire, rejection, jealousy, ambition, interests, rebellions. Another aspect are projections and expectations towards the project or the IT that seem irrational and out of place, and therefore remain untreated.


Once projects develop in this direction, the focus diverts from their potential and possibilities and resistances manifest themselves. Only by ignoring unconscious correlations, a situation occurs in which the change management has no choice left but to focus on resistances. This is due to the fact that any development implies simultaneously opportunities and limitations. Both have a psychological impact and and push towards development. It depends on the context and the participation for stakeholders to perceive development opportunities or to cling to the familiar status quo.


We know all the satisfying experience when something is developed collectively and cooperatively, when something becomes more than just the sum of its parts. We postulate that all stakeholder aspire to this concerted evolution, because basically we as human beings constantly strive to participate and thus realize ourselves. Yet to establish a common ground that recognizes all the differences in both small and complex organizations, is a distinct discipline.

This distinct area can only be set up on a well-crafted project work base. Change Management is a sufficient condition to achieve designated goals, but not a necessity. Imperative though are a neatly run project and liability! Because transformational phases are characterised by uncertainty, a stabilising, reliable framework becomes absolutely vital. Only if the "hard facts" are backed up, can the other variables be attended to. Yet it is those that make the absolute difference as to whether a project is merely implemented or fully accepted and filled with life.

Therefore, change management is always a cooperative task at the interfaces of the organization(s) that synchronise or involve project management, implementation and operative business.


The advisory portfolio of Bayard Consulting includes solutions for the described determinants in the environment of master data management. Initially, our IT experts support you software-independently with the technical processing of your project. We understand master data management as a process that brings together stakeholders from different worlds on a technological basis. Accordingly, we support you with the operationalisation of the project management. The change management is established as an interface function at the beginning of the project and continuously implemented with the aim to create synergies. The relevant correlations are processed in the form of concrete measures and methods that quantifiably enhance the project success and are based on the specific issues at hand.

Daniel Piontek welcomes all questions at

Daniel Piontek is a psychologist who studied in Cologne, Frankfurt and Vienna. For Bayard Consulting he currently oversees an SAP implementation for several hundred employees in the German food retail sector.

Friday, May 29, 2015

Communication and sales promotion 2.0 – stationary retailers sit back and watch

Traditional retailers miss opportunities which good online communication and online sales promotions provides. Online retailers leverage this space for their own growth. Already small differences in the details can have an appreciable impact.

The evolution of online food draws an interesting picture. Emerging online retailers, often Internet pure-players (IPP), typically offer a very selective assortment like tea and coffee or muesli. They conquer a specific market segment, establish themselves and expand their range of products. Of course they show recognizable strengths in their execution of online communication and online sales promotions. At the same time they successively enhance their logistic as the business growths. Meanwhile traditional retailers work on closing gaps in the assortment, increase online-investments and start integrating their different sales and communication channels.

Stationary food retailers retain price leadership in food retailing

Traditional German food retailers still continue to lead the price competition in our observation. Their challenge is to transfer their efficient merchandize management and logistic processes into the online world, where they inevitably have to offer more services and still want to make profit. Store processes, that have been transferred onto the customer since decades, have to be done by the retailer again, like the physical picking of items during the shopping process. By introducing self-service supermarkets back in the 1960's, this was no longer done by the merchant behind the counter, but by the customer himself via pushing his shopping cart through the store. The gain in efficiency was for instance reflected in lower prices. But online retailing means to add services like picking and packing or a comfortable home delivery. Especially customer oriented fulfilment services became an important competitive advantage during last years. As it is very hard to charge extra fees for additional services in the German grocery market, retailers need to scale this business and find new approaches for making it viable. Established retailers will find ways to handle such challenges.

Make better use of online options for communication and sales promotion

However, we often notice that traditional food retailers neglect their online communication and sales promotion in the course of the effort described above. For example, a young online retailer advertises a key value item,  JACOBS Krönung 500g, for 6,59 EUR using Google. At the same time, a stationary retailer runs a price promotion for the same item at 4,29 EUR - a difference of 2,30 EUR or 35 percent. As the Google search result does not bring up this promotion, the potential customer will not be informed about this offer and probably shop the coffee for 6,59 EUR.

Example: JACOBS Krönung 500g - price advantage and customer approach
Example: JACOBS Krönung 500g - price advantage and customer approach
In this example, the stationary retailer could render his promotion interpretable for search engines by utilizing the shared markup vocabulary ( He could explicitly bring up the promotion price, product availability and customer feedback. The algorithms of popular search engines reward and typically use such kind of information when preparing search results.

Approach customers online at the right time and stage in their buying process

The retailer gains a digital version of the popular weekly hand-out without having to place a Google ad. The customer is already targeted during his Google search, long before visiting the online shop when the buying decision was already made. Additionally, a consumer will hardly buy the coffee online for 6,59 EUR if he only has to spend 4,29 EUR during the next shopping trip in the store or online. The retailer can add such kind of online activities as added value to the regular dialogs and negotiations with his suppliers.

Reliable master data is the basis

There are many ways to be more present online and expand the business. The basis for us is reliable master data. We love to help you to properly set up your master data and leverage business value.

Tuesday, December 16, 2014

Only unpacked Chocolate accepted! - Or: Why retail is struggling with incomplete item master data

Whether large or small, whether from Palatinate or beautiful Bavaria, one thing everyone is obliged to do: maintain ones item master data fully and accurately. Yet somehow this task doesn't seem to be that easy.

For almost 2 years now I am involved with an ever growing customer project concerning the synchronization of master data in food retailing. Again and again, I am astonished about how inconsistently many suppliers maintain their item master data. Especially now, in the context of the migration from Sinfos to GDSN, I increasingly encounter the same problem: items with incomplete packaging hierarchies in the data pool.

One could almost say the suppliers didn't know their own products. That they had no idea that chocolate bars come in a box where they are well protected and ultimately need the least possible packaging space in order to be optimally positioned on the retailers shelf. Also, the supplier doesn't seem to be aware that for transport, lots of cartons are being lifted on one pallet which is then foil wrapped - after all, you don't want things to get broken - to be safely shipped to the next retailer by truck.
Apparently, these three packaging levels are simply unknown to them.

Or are they?

You'd think that a manufacturing company knows its products and their packaging.
But why is it then, that packaging hierarchies are often neglected or incompletely maintained? Could lack of know-how play a role? Does the small chocolate manufacturer simply ignore the GDSN standard? Does the small chocolate manufacturer even know what a packaging hierarchy and its hundred associated attributes are? Is he aware of additional mandatory information required by the EU Regulation 1169/2011? Or does he simply see no added value in the electronic maintenance of item master data in the item master data pool for his own business?
Shouldn't he be able to maintain those few mandatory information in the item master data pool practically blindfold?
In fact, this is not as trivial as one at first might think and as many suppliers assume in the beginning. There is no short supply of suppliers who are very happy to have gotten access to the data pool and then ask "And where can I now create an item?" only to call 3 hours later again, because they have realised that their products cannot be published as no order units and and no units of account have ever been maintained.

So, what do we suggest?

Set precise goals: premium, complete item master data aren't simply an operative byproduct. This topic belongs on the management agenda. For the point is not only to meet retail requirements. First class master data are a clear competitive advantage. This encompasses data provision at the touch of a button, the correct presentation of all products in trade partners online-shops, correct delivery of the demanded goods, speedier reaction-times in case of a crisis, enhanced provision of information if required by environmental organizations and of course the optimization of stocks and production.
With the right goals and consistent implementation suppliers creates benefits for themselves while almost automatically supplying its trading partners with master data.

What do we learn?

Item master data management isn't trivial. It's an issue that every company has to face. Even the smallest chocolate manufacturer should establish suitable organization, processes and IT support.
We wouldn't want it to become retail standard only to accept chocolate unpacked! 

Wednesday, July 9, 2014

Tablet of chocolate or box of chocolate? - The confusion over the allocation of GDSN packaging types

The challenge of migrating from Sinfos to GDSN confronts both retail and industry inside the German FMCG community right now. Both parties work hard to prime their item master data for the successful participation in the GDSN. Alongside the migration preparations, there are numerous smaller "sideshows" taking place, which I repeatedly encounter throughout our customers projects.

It is precisely these supposedly minor sideshows eventing around the GDSN migration that I will focus on in my upcoming blog series. Again and again, practice shows that for an effective migration a smooth coordination between industry and trade partners is absolutely essential. The success of the implementation highly depends on the correct interpretation of individual GDSN elements.

Let's start today with the definition of GDSN packaging types. When synchronising the item master data between industry and retail, there seems to be more disagreement regarding the immediate packaging then initially expected. On the one hand, this is due to the definition of the individual immediate packaging. For example, it is easily possible to define a 'carton', yet to differentiate it clearly from a 'folding box' is a different matter. Is it the size? The closing mechanism? Or rather the type of construction?

The abundance of existing packaging types inevitably leads to duplication and demarcation problems between two or more packaging types relating to the same item.

On the other hand it is the perspective from which an item is viewed and the resulting selection of immediate packaging in the master data system which leads to discrepancies between the industry and its retailers. The retailer primarily wants to service its customers. Therefore, parallel to choosing the immediate packaging, it also labels the shelf tags for the store. The industry, however, defines the immediate packaging inconsiderate of the customer's perspective at the point of sale.

The food retailer maintains a "Schogetten Alpenvollmilch" tablet of chocolate preferably as a 'tablet' in the system. The supplier, on the other hand, is more likely to choose "box" as the immediate packaging in the data pool on the grounds "that the chocolate is actually packaged in a box". This in itself is perfectly conclusive. There is only one problem: the customer is used to find "Schogetten" under the tag "tablet" on the shelf label at the point of sale - an expectation that the retailers would obviously like to continuously fulfill.

Let's look at another range of products such as the "Nivea Cream for Men". The cream comes in a glass jar and is additionally packaged in a box. This poses the question which package type to select: "box" or "jar"? The retailer leans towards the perspective of the end consumer, who will eventually hold a glass jar when using the Nivea Creme for Men. The supplier though chooses the packaging type "box", considering the item how it is distributed to the retailers.

Who is in the right? Is there a 'right' perspective at all?

According to the GS1 recommendations, the immediate packaging is also referred to as "sales packaging". In this context it can be legitimately understood as the outer packaging. In this case, the GS1 application recommendation "Efficient Unit Loads" goes even further. It defines "packaging types that are used as additional packaging to the sales packaging and are not necessary for reasons of hygiene, shelf life or the protection of the goods against damage or soiling when delivered to the end consumer" as immediate packaging, thus supporting the industries perspective.

If the suppliers insist on their perspective, the tagging of the immediate packaging on the shelf label will no longer make sense for a lot of items. At this point, the retailer will have to consider which additional applications there may be for the most efficient use of the packaging type provided by the industry. This is because the useful information left to the retailer in respect to the immediate packaging would merely be the information about how he can display his item according to the immediate packaging on the shelf.

Which immediate packaging now is the right and which the wrong, remains undecided. This question will not be easily answered in the near future. The confusion about packaging types thus remains with us; there will still have to be a lot of discussions until an agreement is reached.

In my next blog article I will write on the question why the system of proper packaging hierarchies in the GDSN is so important and what the consequences of incomplete packaging hierarchies can be.