Haircare with Machine Learning

Haircare lies at the center of almost everyone’s hygiene routine: over 90% of the global population uses some form of shampoo, with the global market expected to hit $38B by 2022^[i]. For many, their hair is an expression of their individuality, with unique styles and colors embodying their personality. Maintaining a vibrant head of hair, however, is a job. There are serums, tonics, conditioners, styling gels, sprays, “dry” products and more cluttering up medicine cabinets and showers around the world, and finding a routine that “works for me” often results in a lot of trial and error for the consumer. With so many options – and increasing consumer demand for “organic” or “naturally sourced” ingredients – can haircare products be made truly personal?

The idea does have some promise, and some companies are already claiming to provide personalized offerings based on your own preferences. Fundamentally, shampoo is not a complex product: it consists of a surfactant, which does the cleaning, and water. A variety of other additives tackle other goals such as thickening the shampoo, adding fragrance or protecting dyed hair. That is where the complexity lies, especially considering how no two heads of hair are alike. A “personalized” shampoo needs to accomplish a set of challenges specific to the individual, and each challenge presents a different chemistry problem to the formulations chemist. To make a personal solution, formulators need to become more agile in how they design unique shampoo recipes.

Shampoo: Tackling Some Complex Chemistry

Let’s start at the beginning: how does hair get dirty anyway? It all starts with sebum, a blend of triglycerides, wax esters and free fatty acids secreted from glands in the scalp. Sebum normally functions to protect hair and skin from moisture loss due to its hydrophobicity. However, excess sebum can cause hair strands to stick together and increase the accumulation of dirt – hence why we need to shampoo regularly to keep our hair from having a flat, oily look. The surfactants in shampoo, most commonly detergents like sodium lauryl sulfate, self-assemble into colloidal structures like micelles in solution (i.e. in the shower) to solubilize sebum and wash it away^[ii]. However, by removing sebum, shampoo can also strip away hair’s moisturizing coat, which is why washing hair too frequently can dry it out. As a result, many shampoos also mildly condition the hair – i.e. restore its protective hydrophobic layer – by leaving a different set of fats behind, often silicone-based oils. This raises an interesting chemistry question: how can a shampoo both remove fats yet leave others behind?

The answer comes from the chemistry of the hair itself. Hair consists primarily of the fibrous protein keratin. Due to how it is folded in the cuticle of the hair, the keratin protein tends to have a net negative charge. Conditioning agents – shampoo additives such as quaternary ammonium salts (or “quats”) – tend to be positively charged. This means they can easily bind to the negative charges on the keratin protein, and, since they also tend to have large hydrophobic tails, they can solubilize and “anchor” other additives to the surface of the hair, especially silicone oils and fatty alcohols^[iii]. These additives act as the new hydrophobic layer in the hair, smoothing out split ends and preventing flyaways and frizz.

Machine Learning Offers an Expanded View

Making shampoo personal means expanding upon this central mechanism. No two heads of hair are alike, so a shampoo that leaves thick, coarse hair moisturized and light may weigh down light, fine hair. Additionally, some surfactants can irritate different skin types, and many consumers are turning to naturally sourced ingredients. This wide range of requirements and new ingredients places a significant strain on formulation chemists – each new addition or change means a new recipe. To make a truly personal experience for consumers, formulators need to accelerate their output of new shampoo formulas.

Machine learning offers a way forward by allowing researchers to widen the scope of what formulations they can test. By leveraging existing formulation and its corresponding performance data, machine learning models can predict the performance of new formulations, allowing researchers to focus on the combinations that are most likely to meet the requirements of a given demographic. Quantitative structure-property relationship models can help researchers identify new additives based on their chemical structure. These recommendations can help guide R&D, helping formulators develop products for more personalized end markets.

The success of such an initiative relies on the quality of the data at hand as well as the ability to train and curate quality machine learning models. BIOVIA Pipeline Pilot provides a scalable data science framework, allowing citizen data scientists to aggregate data, automate cleaning and blending of data sets, streamline the training and validation of machine learning models, and operationalize the deployment of data science workflows for their colleagues across the organization.

Check out this case study where a leading personal care company utilized Pipeline Pilot to help guide their shampoo formulation efforts.

Learn More

^[i] What is the science behind 2-in-1 shampoos and conditioners?

^[ii] C.3 – Surfactant Action on Skin and Hair: Cleansing and Skin Reactivity Mechanisms

^[iii] What is the science behind 2-in-1 shampoos and conditioners?