Why you need to consider using small data to train AI models

AI models are only as good as their training data.

Some models benefit from large amounts of data. A good example is OpenAI’s Dall-E 2, which uses massive amounts of data to convert text and speech into images. Other models don’t require a lot of data and don’t actually benefit from more data.

The idea that small data is just as important to artificial intelligence systems and technologies as big data is growing. 2021 scientific american An article by researchers at Georgetown University reports that one way to deal with small data is to first train the model on the big data and then retrain the model on the smaller dataset. This is called fine-tuning.

Lewis Z. Liu, co-founder and CEO of New York-based Eigen Technologies, said that while some fields require big data, such as self-driving cars, many other AI applications can run with small amounts of data. A startup whose AI platform enables businesses to extract data from documents.

In this Q&A, Liu discusses when small data is better than big data and how to make it relevant.

What are the benefits of small data AI?

Lewis Z. Liu

Lewis Z. Liu: If you are small, you have more control.So you can be aware of what kind of bias or non-bias [is present]. It’s more about conscious bias vs. unconscious bias.

When is small data better than big data for an AI model or system?

Liu: I think in the case of intelligent document processing, you want to use small data artificial intelligence.

On the one hand, you have what I would call high standard, low marginal value documentation. By low marginal value I mean ease of automation – like passports, driver’s licenses, W-2 tax forms. These things are simple and a lot of them — most Americans have a W-2 form, right? Half of Americans have passports. These are easy. Typically, you will use a big data approach because your data volume is large.

If you are small, you have more control.So you can be aware of what kind of bias or non-bias [is present].

Lewis Z. LiuCo-founder and CEO, Eigen Technologies

However, if you look at most invoicing processes, your finance department wants to process their invoices, but they probably only have 1,000 invoices per year. If you’re a Wall Street trader and you’re trading some exotic derivatives, they probably only issue 200 derivatives. Or you are an insurance broker insuring residential properties and your brokerage may only be able to obtain 1,000 such property documents per year.

Since you are a lawyer, banker, or insurance broker looking at these documents, there are many more use cases and document types that are very valuable, but the amount of documents per use case is low. So you actually need small data AI to handle all these use cases. Also, generally, the people who view these files are well paid. So you actually get what I call “low volume, high value”.

What if small data AI is not enough?

Liu: Data and documents are only part of the broader story of business operations. Sometimes that’s all you need. In some cases, you need to combine data obtained from documents and other sources. For example, if you’re buying a house – you need to look at title insurance, you need to look at land lease deeds, you need to look at homeowner’s insurance policy. You need to collect data from all these sources, but you also need to collect data from bank accounts and all those things that don’t come from documents.

What is the development direction of big data and small data in artificial intelligence?

Liu: This is highly use case specific. Big data sets are the future. You need a lot of data to train a self-driving car. You can’t use small data for this. However, in many enterprise applications such as intelligent document processing or automated insurance underwriting – these use cases are numerous, but they are all very specific – small data is the way to go.

If big data is the future, how can small data AI stay relevant?

Liu: Big data AI is useful for many applications, but not all applications.

Humans are versatile, and the whole reason humans are so smart is that we are little data machines. We can learn from an example or two and then we can do it. If I show you two dance steps, you might do that dance. It is this flexibility that makes people so versatile in the workplace.

With small data AI, you have one or two or three training examples and you can train the AI ​​to do a certain task. It is this flexibility that makes humans shine. The future of AI is that some AI systems have this versatility and can shine in this way.

Editor’s Note: This Q&A has been edited for clarity and brevity.

Leave a Comment

Your email address will not be published.