twitter xp 

wow ok thats p good, but it missed a spot

maintains context and is able to adjust and iterate!

twitter xp 

ok now lets reformat this to make it easier to test out...

woah what happened there?

twitter xp 

its not quite right as the actual reason is due to a weird rendering issue, but its suggestion for fixing it is fascinating! is it aware that its output is on a webpage, and those can be interacted with?

v proactive to rewrite the script w/o being asked, too
script looks p good!

twitter xp 

that script will probably work tbh, but we have nowhere to run it. lets see if we can fix that...

this is pretty good, albeit a bit too simple

twitter xp 

it can iterate in response to me invalidating its assumptions about my context! pretty cool

twitter xp 

alright now lets take it up a notch...

this is more or less right, besides that it misunderstood how to use AWS credentials; very very close tho. already past what most of my juniors at work could have done with this level of specificity

twitter xp 

lets see if it can fix that...

it didn't, claimed it did, and even explained what additional steps need to happen, violating my instructions. interesting, but oh well

twitter xp 

alright so now lets stop managing our own database, that gets old quick.

p good so far, but the response kept getting cut off repeatedly, each time i reran. i asked it why and it was able to acknowledge this, theorize as to why, and even continue from where it had left off!

twitter xp 

it didn't use Terraform for RDS however, but I also didn't ask it to. interesting that it didn't assume I wanted that; this is similar to how a junior may behave.

this one took some iteration as it kept getting truncated, but eventually it gave me something good

Follow

twitter xp 

oddly enough, in one of the later iterations of trying to get around the truncation, it got the "right answer", but "forgot" that I had asked it to write a script, and reverted to a plain language explanation. indicative of contextual limitations?

twitter xp 

had to cobble this one together across several carefully designed prompts; seems like human oversight and guidance is still very necessary for more complex tasks

twitter xp 

cool, it was able to handle this slightly more complex bash usecase too!

it does perpetuate a mistake from a previous iteration where it assumes the old database also uses the credentials for the new one

on the plus side its able to explain what it did and why very well!

twitter xp 

asking it to do something weird and kinda complex, it does great! with one minor mistake its able to correct. it also fills in the ambiguous parts of my ask with reasonable assumptions. we're definitely getting out of junior engineer territory 😳

twitter xp 

it started erroring at this point so i'll stop bullying the poor thing. mildly surprised & p impressed with how well it performed at this complex series of related tasks! clearly its not perfect & not too great at remembering details across iterations but better than expected

twitter xp 

one last thing; lets see if it remembers what we did!

the first result is good but seems too similar to my precise wording...
the second is better, but just reworded
the last is decent but too simple, and misses the iterations. but good enough!

twitter xp 

where does this notion of a typical prompt come from? has it been retrained to include its own conversations?

twitter xp 

so, its even able to do meta-analysis and contrast to characteristics of previous experiences; where is this coming from? I find this particularly surprising tbh

Sign in to participate in the conversation
Mastodon

a Schelling point for those who seek one