Debugging your parenting skills: Is training a model similar to raising a child?

Nikolina Kosanović

Debugging your parenting skills: Is training a model similar to raising a child?

Let’s start with the obvious – no artificial neural network will ever be able to “push your buttons” as relentlessly as your child, but it also won’t bring you as much joy.

Humans are emotional beings, which is one of the most important things that separates us from AI and robots. But even emotions, especially their management is a skill we are developing constantly, along with many others. Like learning how to be a great ML engineer, and how to be a decent parent. Or not…?

To become a Machine Learning engineer, you would need to complete some education, possibly pursuing additional courses and acquiring the essential tools to tackle challenges in the field. Similar to any other job.

However, there are no safety instructions, instruction manuals, or education for parents. I found this hard to accept.

I have an amazing little 18-month-old girl. Her arrival has turned my life around. While trying to understand her behavior, I searched for answers in the latest psychology books. Much of this literature draws from cutting-edge neuroscience research, unveiling striking resemblances between the workings of her remarkable young mind and the neural network models I work with.

The artificial “brain”

The building blocks of artificial neural networks were based on the functioning of the human brain. One of the first attempts to mimic the dendrites in the late 50s and 60s were perceptrons. Dendrites receive signals from other neurons, which are then integrated and processed within the neuron. Similarly, perceptrons receive input signals, apply weights to these inputs, and produce an output based on a threshold or activation function. Neurons and perceptrons play crucial roles in information processing and decision-making within their systems.

However, perceptrons had limited ability to handle complex problems due to their linear nature and inability to learn non-linear patterns. Artificial neurons today are typically more complex than perceptrons. While perceptrons are simple binary classifiers operating with a step function, artificial neurons in contemporary networks often incorporate non-linear activation functions. That is, the math behind it is a little more complicated.

But what does this have to do with the human brain and kids?

During training, specific variables (already mentioned weights and some other numbers –biases) are calculated and changed in each step to minimize the loss function – to get closer to the goal of a trained network.

Similarly, the human brain also changes; in fact, itis changing all the time. As we learn, some connections between neurons strengthen, while others grow weaker as we forget some information. Every time we remember or repeat something – we are strengthening those connections. When mastering a new skill, we are changing our brain the same way the neural network is “changing” its parameters while training to solve a business problem.

How exactly do we teach someone a new skill? One way would be by using supervised learning.

Supervised learning

Supervised learning in ML is when the model is trained on labeled data. We know what the “correct” answer is. Each input has a corresponding correct output. For example, classification problems, like training a model on images of cats and dogs to predict which animal is on the image, or regression tasks, such as predicting car prices from elements like the car manufacturer and model.

The goal is for the model to learn the mapping between inputs and outputs, allowing it to make predictions or classifications on unseen data.

A supervised classification model we worked on in Njuškalo Technology is part of the “Reveal” feature. Reveal was designed to help our users submit the product they are selling by having the product category suggested to them.

You can try it when you submit an ad on the Njuškalo mobile app. When you add a photo, the model suggests a category based on that photo. This image-classification model was trained on millions of labeled images from our dataset. The feature also shows the product price estimation, giving the user a sense of the possible profit.

Sometimes, it can be hard to tell the exact category from only an image. When you have a photo of a house – is it a house for sale or rent? Not even a human would know without seeing the text of the ad. Therefore, we also trained the model to use ad titles to improve its prediction.

After submitting the image, if you start typing the title of the object you are selling, the model will suggest categories, taking into account what you typed.

While working on training those supervised models for the categorization of ads, I also had an example of supervised learning at home: we were teaching our daughter to recognize animals. This job also requested lots of data – many picture books and repetitions.

Dangers of small datasets – overfitting

LLMs (large language models like ChatGPT and Lama) and other deep learning models are hungry for input data, but kids are, too! Every new experience is something they learn from. So, when we have a small dataset, this can sometimes lead to overfitting.

Overfitting happens when a model is excessively complex relative to the simplicity of the underlying data, leading to high performance on the training set but poor performance on unseen test data.

I noticed this happened with certain classes of animals my daughter was trying to recognize. The only example of a mouse we had was from one coloring book. Hence, it was not surprising that when I showed my daughter a new mouse drawing, which was also of a different color – she did not classify it as a mouse. It was like her neural net had overfitted – she knew what a mouse was on the training data, but she failed to generalize well on the test data.

We solved the problem the same way you would with an overfitting model – we added more data and showed her different images of a mouse. Now, I can proudly say that she excels at recognizing animals, especially mice, and yells “squeak, squeak” when she sees one.

If only all problems were this easy to solve! 🙂

Most of the time, this is how we imagine our kids are learning – we say something, they listen or repeat. So nice and orderly, right?

WRONG!

It turns out that this is only a tiny fraction of how kids learn. Learning is much messier.

Unsupervised learning

Unsupervised learning models aim to find patterns in data without explicit labels.

For example, grouping similar ads or finding what similar users looked at to recommend items are examples of unsupervised learning we used to enhance Njuškalo user experience and increase buyer-seller matchmaking!

Underneath each ad, you can find similar ads, which can help you find the product you are looking for. As a result, sellers have their products exposed in more ways than just through the basic search, and buyers have a nice grouped view of all similar products from which they can choose.

Unsupervised learning is a considerable part of how humans learn. The amount of information that our brain processes from only passively observing the world is enormous! When we consider that kids constantly interact with their surroundings, exploring the consequences of their actions – this is something that AI cannot do. At least for now.

The fact that unsupervised learning can help discover some unexpected patterns that might not be captured through supervised learning inspired the new V-Jepa model from Meta [2].

As humans, much of what we learn about the world around us—particularly in our early stages of life—is gleaned through observation. Take Newton’s law of gravitation: Even an infant (or a cat) can intuit, after knocking several items off a table and observing the results, that what goes up must come down. [2]

The goal of V-Jepa is to build models that can learn as humans do and have a contextual understanding of the world based on the mental world model.

If you want more details on this topic, I recommend listening to this podcast [3] or reading the post [2].

Everything our kids experience influences their internal mental world model. During the first few years, parents play the most significant role and thus have the most powerful impact. Since this model of the world is built mainly by unsupervised learning, combined with the fact that what we say is the smallest part of our communication – how we behave will probably influence them the most. Therefore, what you do, especially when you are not explicitly trying to teach them something, matters a lot. We (parents, teachers, family, …) are creating their training data by our behavior.

Data quality – how clean is your training set?

Our brains physically change each time we learn something new. This phenomenon, known as brain plasticity, suggests that what we focus on and do frequently has a significant impact.

If we think of everything we dosee, and how we interact with the world as input data for “training” our brain, it may give us a different perspective.

Suddenly, all that you do seems more important, doesn’t it?

When it comes to children, this is even more pronounced since their brains develop extremely fast. In the first three years of their lives, billion connections are made every second! [5]

It is a well-known fact in ML that if your training data is dirty, you will get garbage as an output, as well. Dirty data means any data that has duplicates, unexpected outliers, missing values, errors, or inconsistencies. To name a few examples:

  • Mislabeled data – imagine you are training a model to recognize if an image contains a human, a cat, or a dog. The labeler hired to label your data did not drink coffee that morning and wrongly labeled some pictures of dogs as cats and vice versa.
  • Unexpected values – sometimes, when working with tabular data, some values in your dataset could be missing, have different types than expected, or have some unexpected characters that could influence your model, sometimes even without anyone noticing.

That is why so many data preprocessing and cleaning methods exist.

Should we be doing the same in our private lives? Well, it probably would not hurt. What could be considered dirty data in parenting?

  • Mislabeled data – a parent is trying to teach a child to behave like A while behaving like B. For example:
    • Yelling at a child to teach them to be calm (to stop yelling)
    • Swearing while trying to teach them good manners
    • Being glued to our phones/devices and not present, while trying to teach kids that extensive use of devices is not good for them.

We often forget that we are sending conflicting information – it is not acceptable to yell/swear – but I am yelling/swearing.

  • Inconsistencies – it was alright to eat three ice creams yesterday, but today it is not.

It is indeed hard to be consistent. However, we should at least acknowledge we are doing it and take into account that “training” that particular skill will take more time since we are not sending consistent data.

Do bear in mind – there is no such thing as a “perfect” dataset, the same as there is no “perfect” parent. It is the intent that counts, and having realistic expectations.

Unwanted behavior – when the models misbehave

While trying to change unwanted behavior, repetition and persistence are key. This is true for ML models and kids alike.

Let’s say your classification model performed poorly on one category – the next thing you should do is check if data was mislabeled for that category. Next, try to gather more data and train it again. Repetition and persistence.

Something similar is done to teach ChatGPT and similar models to behave and reduce the number of hallucinations. More data and examples are created for the situations they handled inadequately, and the model is fine-tuned to be able to respond better next time. Bear in mind that ML scientists claim hallucination is not an accurate expression. Confabulation would be a much closer term that describes what happens. Confabulation is a term from human psychology that refers to the generation of plausible-sounding but potentially inaccurate or fabricated information without the intent of deceiving someone. It is a common characteristic of AI language models when they produce responses based on limited or incomplete knowledge. [6]

The training phase – the brain is changing!

Any ML engineer will tell you – that training a deep learning model takes A LOT OF TIME. With the increasing amount of data needed to train the models, as well as using different modalities – pictures, videos, and text, the training time increases as well.

But any parent will also tell you – raising a child takes even longer (sometimes it may seem to last forever ).

Recent neuroscience studies have shown that it takes decades for the prefrontal cortex (part of the brain responsible for decision-making) to mature. Contrary to prior assumptions, full maturity is not reached at ages 18 or 21, but rather around 25 to 26 years of age.

Conclusion

Hence, next time your toddler throws a full plate of soup on the wall, take a deep breath and remember – their brain is still in training. Each unwanted behavior is the opportunity to teach their little neural network new skills. It is like you just got new training data! Use it wisely.

When they hit puberty, it will be like putting your model into production because your input stops being relevant to them. At that point, you need to hope you have done your best during the training phase and that they will perform well enough in the real world on their own.

Now, whether you are trying to train a model or going to spend the rest of the day with your kid – I wish you good-quality data and a LOT OF patience!

Sources

[1] https://www.quora.com/What-is-the-differences-between-artificial-neural-network-computer-science-and-biological-neural-network

[2] V-Jepa https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/

[3] Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast

https://www.youtube.com/watch?v=5t1vTLU7s40

[4] canva.com

[5] https://www.amazon.com/No-Drama-Discipline-Whole-Brain-Nurture-Developing/dp/034554806X

[6] https://arstechnica.com/information-technology/2023/04/why-ai-chatbots-are-the-ultimate-bs-machines-and-how-people-hope-to-fix-them/