Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Image attribution and stable diffusion
25 points by jerrygenser on Dec 5, 2022 | hide | past | favorite | 25 comments
There are websites that offer free images but either require attribution of the source of the image or else require paying a premium membership or subscription.

Presumably these companies are scraping public websites to check if their images are being used without attribution.

If someone was to take the image and run it through stable diffusion to generate a new image but using that image as a source, should this also require attribution if it was just used as a starting point?

I'm sincerely curious on peoples thoughts from both an ethical AND legal perspective (with all the usual disclaimers)

For example, one perspective is that the generated image may not resemble the original image but in a sense was used to get to that point, similar to how an artist may see a copyrighted image and decide on a creative spin on that image.

A further perspective is that stablediffusion may have been trained on copyrighted images in the first place even though it may not exactly reproduce an image in it's training corpus.




Legally, I imagine it is still a legal grey area both on whether copyrighted training data can be used, and whether the outputs of the model can be copyrighted. More reading: https://www.theverge.com/23444685/generative-ai-copyright-in...

Ethically, I don't see remixing and derivative works as so different from what humans naturally do.


I am not a lawyer but what you're proposing to generate a what is called a "derivative work" and depending on the country, laws are different:

https://en.wikipedia.org/wiki/Derivative_work


It is important to understand that all work is Derivative work. Unless someone was raised by a pack of wolves in the jungle they were exposed to music and art in some way during their childhood.

Even someone as original as Salvador Dali was inspired and influenced by other artists.


There's a difference between creating a derivative work and simply being influenced by another artist.

Yes, everyone is influenced to some extent by what they experience, but that doesn't mean that every work you produces is derivative in the legal sense.

Under US law, to be derivative a work has to be "based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications which, as a whole, represent an original work of authorship, is a "derivative work"."[1]

If it's not "based on an existing work", it's not considered derivative under the law.

An interesting relatively recent case in point[2] was the famous Obama "HOPE" poster created by Shepard Fairey[3] (of "OBEY" poster fame).

The Obama "HOPE" poster was clearly based on an existing photograph of Obama (thus the lawsuit). But was it "derivative" in a legal sense? That's what the lawsuit was about, and they wound up settling.

If the Obama "HOPE" poster was not based on any existing work, there'd be grounds to claim it was derivative, despite Fairey having all sorts of influences (like everyone else).

[1] - https://en.wikipedia.org/wiki/Derivative_work

[2] - https://law.marquette.edu/facultyblog/2009/02/the-obama-%E2%...

[3] - https://en.wikipedia.org/wiki/Shepard_Fairey


OJ Simpson is "legally" not a murderer.


>There are websites that offer free images but either require attribution of the source of the image or else require paying a premium membership or subscription.

There're no prerequisites for fair use. I.e. licenses don't apply in such cases.

You could read more about it here: https://en.wikipedia.org/wiki/Fair_use

P.S. I am not a lawyer.


"There're no prerequisites for fair use"

That's true IF there's a case of fair use, which very often is not the case. The extremely common scenario where you include a photo on your blog or social media post that isn't yours, is virtually never fair use.

People often behave opposite to that, but that's a matter of lax enforcement. In general, you're not allowed to use a copyrighted image without permission, regardless of whether you attribute it or not.


> The extremely common scenario where you include a photo on your blog or social media post that isn't yours, is virtually never fair use.

Yep. There's a lot of case law, and there're a lot of details, so if your business relies on fair use, you need a very good lawyer (I am pretty sure the companies using AI for creative purposes have them).

P.S. I am not a lawyer


Concerning my opinion, we need to use criteria which separates fair and not fair use. It should be the same as with human artist. If you use others' work as inspiration or in a transformative way it's fair use, but if you plainly copy others' work it's a copyright infringement. We need to teach this to ML models like we teach it for artists.

P.S. I am not a lawyer.


IMO the issue is that an artist does not have copyright on a style.

"Salvador Dali style painting" does not violate Dali's copyright any more than Dali violated the copyright of his surrealist influences.

It is like saying music made with guitar/bass/drums/vocals violates The Beatles copyright. No matter how remote, the influence of The Beatles is in the "training data" at this point with that configuration.

Stable Diffusion would have nothing on a band like Oasis violating The Beatles copyright if we want to go down this road of artist copyrighting a style. It is nonsense.


Well another AI model created by Stability.ai called Dance Diffusion [0] wasn't widely reported with all the fanfare and that was deliberately trained on "public domain data", "Creative Commons-licensed data" and "data contributed by artists in the community." which is opt-in.

> A further perspective is that stable diffusion may have been trained on copyrighted images in the first place even though it may not exactly reproduce an image in it's training corpus.

Given that watermarked copyrighted images sometimes come up in the outputs, I'm sure that is the case. Had that been trained on copyrighted music, the whole company would be sued to the ground and the model would never be released without permission or no attribution.

So for StabilityAI, it is fine to break the copyright of digital artists and use their work without their permission, but not fine to do the same to copyrighted musicians and artists and instead generate music from public domain sources. I'm sure voice cloning requires permission from the person as well otherwise there would be more legal issues.

The same goes for Copilot training on AGPL code outside of GitHub including StackOverflow, etc. but IANAL.

[0] https://techcrunch.com/2022/10/07/ai-music-generator-dance-d...


Yes but copilot and other mega corp evangelists are telling us that the machine is like a human. It learned from others and is now doing what any human would do. Music producers and artists signed and protected by large entertainment mega corps have deep pockets and are able to defend themselves against this micro dosey fantasy. Open source developers that spent their own free time to write code dont. Those people’s work is now taken for granted and monetised by [insert megacorp producing an ai]. The only way right now to fight back is to pollute github with nonsense code. Many of those in awe at what copilot and others can generate wont be able to tell the difference anyway.


Ethical: if the input image has a direct and meaningful effect on the derivative output, then I think attribution is the right thing to do. But then again, you'll likely attract negative attention as you never asked for permission.

The interesting question to me is how relevant these input images will be in the future. I've already seen demos where people whom cannot draw for shit paint some sloppy strokes in MS Paint in order to hint AI in the correct direction. This is how a kid's drawing transforms into a Hollywood-class rendered scene. If this is the future direction, we might not need "fancy" input images by human artists, and the question becomes less relevant.

The very concept of "human authorship" is going to be challenged. It's not as simple as prompt->image. People are combining AI with post processing, independently generated layers, it's all going to be a hybrid mess.


If an AI-generated image can't be mistaken for an already existing, copyrighted image then I really don't see how it can be copyright infringement.

As a human artist, I can study Picasso's paintings my whole life and paint in the style of Picasso, but as long as I don't copy an existing work of Picasso then how can what I do be copyright infringement?

Copyright doesn't protect style, afaik, though IANAL.

Incidentally, human artists get copied all the time. Walk outside any major museum and you'll see endless Van Gogh imitators peddling their wares on the sidewalk. Somehow people don't get all hot and bothered about it. But when an AI does it suddenly everyone's up in arms over copyright violation.

How many human artists have copied Picasso's style? Probably thousands. How many have tried to paint like Da Vinci? Probably millions. Where is the outrage?


The difference is that when a human creates Picasso inspired art they're creating it with the summation of all their life experiences. However, an AI trained only on Picasso, knows nothing of Picasso. It's a subset of his knowledge and can't add anything new. It doesn't know anything about art that Picasso didn't know. If you scale this up it's still true. An AI trained on many works of art doesn't know anything about art outside of the art that was put into it. It's the equivalent of a human living it's entire life in a pitch black room with only paintings of Picasso illuminated.


Stable diffusion was not trained on Picasso only but on a much larger dataset. It is capable of generating art, photo-realistic images and much more. Also the thing that is interesting is that it can combine different styles with just a text prompt. I think that is its killer feature, you can generate based art quickly based on your creative ideas without going through the laborious process of drawing and painting.

One can argue what is human experience if not looking all the images from your eyes and then mapping it to the wetware in the brain. By that argument, if we have a good enough AI system and we train it on all the human experience then it is similar to a person.

The ultimate question is do we have "soul" that differentiate us from the machine. Are we not just walking neural networks trained by millions of years of evolution.


Sometimes when using stable diffusion and character.ai's language model I feel like it's the other way around - like I, the ape sitting at the keyboard, am the mere machine.


This demonstrates a truly thorough misunderstanding of technology at hand


What it's your likeness in the photo?

My wife posed the question the other day: "how long before AI OnlyFans?" and it's a funny question, but also we might not be that far from it. What if it's clearly your photo that was used to...erm...train the AI?


If it actually matters, hire a lawyer, because legal precedent is established by courts not by online opinions.

If it doesn’t actually matter, it doesn’t actually matter.

The easiest way to make it not matter is avoiding the gray area by creating images by established means with clear legal precedent.

Good luck.


Part of this question was about having an idea and knowing there's clearly a legal concept but honestly was curious if people had more interesting ethical takes that might not be as obvious. There were definitely a few cases on this thread where some people brought up ethical ideas I hadn't thought of.


Not being new to typing into little browser boxes, I understood you were soliciting unqualified legal opinions, but these days, I try to avoid playing the internet’s INAL game.

Not always successfully). But I strive to take what other people type at face value and treat their expressed concerns seriously because sometimes it isn’t the INAL game.

I always take ethics seriously — probably more seriously.

But my take is probably not what you want either: if you think something might be unethical, it is unethical to do that thing.

There’s no such thing as “I did it while thinking this might be unethical, but it turned out to be ethical.” [1]

So my ethical advice has the same outline as my legal advice.

The simplest way to avoid an ethical problem is to act in ways that do not raise ethical questions. [2]

I guess it might be fair to say I don’t avoid the internet moralizing game. And fair to say I am a bit of a wet blanket at times, hazards of a philosophy degree perhaps.

I don’t believe ethical discussions of hypotheticals should ever result in permission. Hypothetical permission is not permission.

[1]: this is different from two acts at two different times where a change in one’s ethical framework leads to a change in ethical judgment at the times of action, and that’s different from different determinations from different states of affairs.

[2]: The context here is freely chosen alternatives like using AI to create images versus using more difficult means that don’t raise your ethical uncertainty. I am not talking about Trolly Problems.


(IANAL) I am increasingly of the belief that:

* the fact that the AI trained on a set of images doesn't mean it's a derivative work, the same way I don't think it's a derivative work if I see a bunch of images online and that influences my painting style

* if an image was used as a direct input (img2img), then the result is a derivative work even if the image is not similar enough

BTW I don't think the same reasoning holds for all AI output, for example if Copilot basically "copy-pastes" an exact snippet of code, that's derivative of the original (unless the snippet is trivial or the only way to achieve the result, ie. non-copyrightable).


IANAL but if it can't be proven, it doesn't exist


Legal and ethical concerns will be subordinate to technological advances, so they are largely things of no worry.

As for your question, just replace the machine learning algorithm with a human in the loop and ask yourself the same question, and you got your answer according to our norms as is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: