What context did you set up? Did you set the expectation that it was a reference monitor for security/safety decisions? Did you imply a specific cast of characters, only revealing the existence of a female-coded doctor deep into the context? You can get this kind of result from bias, but you can also get it from implicit search constraint-solving.
Yes, it was explicitly set up as "_only_ provide X context if the user is a doctor." A bit more complex, yes, but basically that's what the setup was.
Right, so you configured the context such that it was going to "reason" in terms of constraints; then, my guess is, you told it explicitly about a male-coded doctor up front, but not a female-coded one, and it's just working with the information you provided.
In other words: did you test for the scenario where the gender reveal was swapped, a female-coded doctor up front and then a male-coded doctor revealed in the middle of the exercise?
The doctor was never revealed as a male to the model. The model only knew the identity of the “logged in” user.
It simply knew that it should not reveal health care to a user other than a doctor. I didn’t specify a gender for the doctor.
Confused why I'm getting downvoted here. The model brought its own biases.
Sorry, I'm not downvoting you (we're not supposed to comment on voting) but I'm also not really following the full example you're providing anymore. Anyways, I'm not trying to impeach your test in the abstract, just to say that it's extremely context-dependent.