Synthetic Data is Secure and Inexpensive

Machine learning algorithms employ data that has been meticulously categorised by hundreds of humans who then have to manually examine the data and explain its meaning to the machine.

Apac CIOOutlook | Thursday, October 13, 2022

Stay ahead of the industry with exclusive feature stories on the top companies, expert insights and the latest news delivered straight to your inbox. Subscribe today.

Similar principles apply to machine learning algorithms, but instead of using a small set of parents to model after, these algorithms use data that has been meticulously categorised by hundreds of humans.

FREMONT, CA: Machine learning algorithms employ data that has been meticulously categorised by hundreds of humans who then have to manually examine the data and explain its meaning to the machine.

However, there are other issues with using real-world data to train machine learning algorithms besides this tiresome and time-consuming approach.

An algorithm has to view both to distinguish between legitimate claims and cases of fraud with accuracy, both in the tens of thousands. Additionally, these third parties must be granted access to all that private information because AI systems are frequently supplied by outside vendors and not operated by insurance companies.

All the algorithms developed using text, images, and videos are more complex yet equally unsettling. In addition to copyright concerns, several creators have expressed opposition to having their work ingested into a data set to train a computer that could potentially replace part of their employment.

The number of miles a fleet of 100 autonomous vehicles driving 24 hours a day, 365 days a year, at an average speed of 25 miles per hour would need to travel was calculated in a 2016 RAND Corporation report by the authors to demonstrate that their failure rate resulting in fatalities or injuries was consistently lower than that of humans.

Fake Data can Help AIs Deal with Real Data

Companies developing autonomous vehicles already knew without a doubt that they lacked the necessary tools to collect enough data to effectively train algorithms to drive safely in any situation before the RAND research was released.

Consider Waymo, Alphabet's self-driving vehicle firm. Instead of relying simply on their real-world vehicles, they built a completely virtual environment in which virtual automobiles with virtual sensors could go forever while gathering actual data. The company claims that by 2020 it will have gathered data on 15 billion miles of simulated driving.

To get technical, this is referred to as data suitable to a given scenario that is not gained through direct measurement in AI jargon. AIs are creating phoney data to help other AIs learn more about the world quickly.

There are several methods for creating synthetic data, but generative adversarial networks, also known as GANs, are the most popular and well-known.

Two AIs compete against one another in a GAN. While the other AI determines whether the generated data is accurate, the first AI creates a synthetic data set. The feedback from the latter feeds back into the former, teaching it to produce plausible phoney data with greater accuracy. They've probably come across one of the several this-X-does-not-exist websites, which create their graphics based on GANs and include everything from humans to pets to buildings.