276
YouTube creator sues Nvidia and OpenAI for ‘unjust enrichment’ for using their videos for AI training
(www.tomshardware.com)
This is a most excellent place for technology news and articles.
I don't think it is relatively difficult to make "Ethical" AI.
Simply refer to the sources you used and make everything, from the data used, the models and the weights, of public domain.
It baffles me as to why they don't, wouldn't it just be much simpler?
Source: The Internet.
Most things are duplicated thousands of times on the Internet. So stating sources would very quickly become a bigger text than almost any answer from an AI.
But even disregarding that, as an example: Stating that you scraped republican and democrat home sites on a general publicly available site documenting the AI, does not explain which if any was used for answering a political question.
Your proposal sounds simple, but is probably extremely hard to implement in a useful way.
fundamentally, an llm doesn't "use" individual sources for any answer. it is just a function approximator, and as such every datapoint influences the result, just more if it closely aligns with the input.