New AI app for describing images and video: PiccyBot

Re: Multiple images

@Blindpk: Nope. PiccyBatch was for processing and adding descriptions to large number of pictures. What I am looking for is something simple; like what we have in BeMyEyes where you have an add picture button and then you can click a new picture and compare etc.

Re: Gokul

Oh yes, you're right, that is what PiccyBatch is. The first part of my previous message stands though (and yes, I would also like to be able to add/process multiple images).

Multiple images

Blindpk, I will look into multiple images, but at the moment it may complicate things (some models support, some not, how to mix video and images, etc.). It's on the list, but later.

Instagram should be fixed

Inforover, thanks for the alert, the Instagram video sharing to PiccyBot should work again. Note that only posts from public accounts can be shared.
Apple connect is working fine again, so please try again Hasajaza, it should work.

New models

Over the last weeks or so a few new models have been released:
OpenAI: ChatGPT-4o-latest. This is not really new, but it seems it currently outperforms the standard GPT-4o with image descriptions (at a higher cost).
Google: New versions of Gemeni Pro and Gemini Flash (don't know how their API is structured, but maybe it already uses them, they end in -002 I believe).
Meta: Llama 3.2 in one bigger and one smaller variant.

If you haven't already, can you check if any of these can be (and are worth) implementing in Piccybot?

Update on new models

Blindpk, thanks for asking. I am working on adding Llama 3.2, that should be available this week. The Gemini models PiccyBot uses are already the latest.
Regarding ChatGPT-4o-latest: looking into it. OpenAI lists it as a text only model like O1, so it is a bit confusing. The limits for it are also very low, seems to be a mistake on OpenAI's part.

But definitely keeping an eye on any new models to add to PiccyBot to improve it further!

Still waiting to PC access :)

Now that Be My Eye/AI allows PC access for window users, I'm again bringing this up: it'd be super super nice if I can access piccy via my PC. It's my favorite photo describing APPs by far on my Iphone, but labeling/renaming photos on the iphone is tedius at the best of times. If I can do my photo sorting on my PC, it'll be way faster and easier on my nerves :)

Re: Update on new models

Thanks a lot for the info! Very strange with ChatGPT-4o-Latest, it is definitely an image model, it is in the lead on Chatbot Arena's Vision leaderboard for one thing :) and I use it for images with my API key.

PC and models

PrivateAI, thanks for asking. I am thinking of creating a browser plugin for PiccyBot. Just have to check how to make the free and pro versions happen. I don't want to deny you guys the best features but don't want to go broke either.
I'll definitely look into ChatGPT-4o-Latest, blindpk. It is very likely a glitch in de OpenAI documentation.

Sharing from TikTok, Facebook, Youtube shorts, Youtube, Snapchat

As requested by you guys multiple times, I have added support for sharing videos from TikTok, Facebook, SnapChat and Youtube. Just open the video and use the share feature to send the video to PiccyBot.
I appreciate the feedback on this to get this working perfectly! Please let me know how it works for you. Note that it is still limited to shorter videos only but I could increase the length later on when it is all working stable.

PC version and question about sharing from Facebook

To have Piccybot available on PC would of course be great! As I have understood it, som e here use Chrome and some Firefox, so if you make a browser plugin it would ideally support both. Of course you must look for a good and sustainable financing option.
Then a question about sharing from Facebook, do I understand it correctly that this is only for videos and not for images?
Keep up the good work!

Sharing from Facebook

Blindpk, correct, I gave priority to video sharing from Facebook before images. You can describe images in multiple ways already and you could save to phone and describe it from there, so there is a workaround. But will add direct support for that as soon as the rest is stable.

Thank you

Yes, that sounds like a reasonable priority, as you say, there are workarounds for images, even if they are a bit annoying (especially since no description apps that I know of can describe images directly from FB specifically).

I've tested the video feature and hhere are my thoughts.

I know for a fact this took a lot of work to implement so I don't want to take away from that in the slightest but at the moment, it's like i'm still checking out a still image.

Would it be possible for the video to play and either have the AI make an audio description track, it would probably have to be trained but I honestly have no idea where to begin with that.

Or, play a bit of the video,, explain that bit, play the next bit and so on?

This is interesting but I couldn't run a youtube video through it and sit back and relax knowing that the AI would describe things I'm missing visually, at least not yet, I hope it can get there one day though.

I tried it with a couple of shorts and it works but it feels a bit... I'm not sure, gimiky? I don't like saying that but at the moment that's how I feel.

Perhaps you could contact the NFB, I think they describe things for Americans but I'm not sure as I'm in the UK, and ask them if they'd allow you to train your moddle on older AD scripts, if they exist.

ideally what i'm looking for is an app on my pc where I can put a video through it, wait a while for all of it to be put together and then have it described in a professional mannor along with the video playing.

I really think a donate button would be great in the app, even though i'm giving quite negative feedback, in my oppinion, i'd love to donate a bit more to this ap.

Also, there doesn't seam to be a way to restore my perchace, i know I bought the app, but I keep removing it but when I download it again it acts like I'm downloading it for the first time again, is that something that needs to be worked on or have I just not found the setting?

Also you'll want to label the help for the settings screen button something a bit more helpful than what it is at the moment.

Not getting this app?

Hi! Tried this app at work today. I'm very sad to say that my first impression of the app was awful. It gave me a basically correct description of what was in the picture, but added lots of flowery non-sense that was unnecessary. When I asked to describe the man in the picture it refused to give me any details about what I looked like or what I was wearing.

Tried another picture and aasked it to read some printed info to me. Again described other things but refused to describe what I asked for. Again and again.

Not sure how this app could be helpful? Am I doing something wrong?

@GayBearUK nope.

If you go into settings and turn the personality off it improves things but I'd personally not use it to describe stuff.

The video thing is great and the developer really does care about making the app the best it can be but for me I'd not use it to describe things.

Then again, perhaps it's because I was born blind, I don't need to know all this stuff,

Perhaps it's because I was born blind but I find it a bit to much. It describes so much stuff that just isn't needed, that's not the developers fault at all, all these AI moddles do that. You can go into the settings and lessen the tokens used to make a shorter description but I just find it's a bit to personal?

No, personal isn't the right word but I don't know what else to call it.

For example, if i'm looking at a Harry Potter short, I don't need to know that character x is formitable, or character y is ready for action, that doesn't really tell me anything. Put it this way, you can tell it's written by an AI.

I think it really needs to be trained on AD data to get the tone right.

GB and Brad

When I tried it, you had to pay to be able to tone it down. I objected to that, so use something else, but lately the work the dev has been doing is starting to make me think about changing my mind.

Brad, those tings mean something to me, probs to anyone who has been able to see. Like colours and prespecitve and shadows I suppose.

Im not sure about this app, but with others you can create your own prompt to, for example, not use any vision-related words, or not talking about people Etc. It really can improve things.

Also, OPEN AI launch visual fine tuning, I wonder if someone will use that to create better models for the blind?

Here's what I got for a Harry Potter Lego short.

The video is an animated re-enactment of the Battle of Hogwarts from the Harry Potter movies, using Lego figures. The scene opens with a large Lego Hogwarts castle, with Death Eaters flying overhead. The sound of battling and explosions can be heard in the background.

The first character we see is a Lego Harry Potter, who is standing in the center of a courtyard with other Lego characters. Harry is looking distressed and is holding a brown, pointed hat in his hand. He is dressed in a grey and white sweater and blue jeans.

Then, we see a Lego Voldemort, who is standing in front of a group of Death Eaters. Voldemort has a pale white face with dark, piercing eyes and a mischievous smile. He is wearing a long, green robe with splashes of brown dirt.

The camera moves to a Lego Bellatrix Lestrange, who is standing next to Voldemort. She has long, black, curly hair and a stern expression on her face. She is wearing a black dress with a necklace and black arm bands.

Next, we see Harry draw a silver sword, which is similar to the sword of Gryffindor. He is looking determined and ready to fight.

The video then shows a series of close-up shots of Lego characters battling, using magic and swords. We see Harry and Ron Weasley, who is also dressed in a school uniform, fighting alongside each other, showcasing their close friendship. There is also a Lego Hermione Granger, who is looking fierce and confident as she points her wand at the enemy.

Later, we see Voldemort and Harry fighting on top of one of the towers of Hogwarts. Harry throws a spell at Voldemort, who dodges it and then lunges at Harry. Harry tries to push Voldemort off the tower, but Voldemort grabs him. Finally, Harry manages to push Voldemort over the edge, and he falls off the tower, creating a huge explosion.

The video ends with a shot of the Lego Hogwarts castle.

It's cool, there's no doubt about that.

I know it seams i'm going back and forth on this, and i think that's because of the way it's formatted at the moment, if it could be trained on audio described data and play the video with the audio, I think that would make a masiv diffirence, although it would have to learn, some how, when to insert itself like an audio describer would.

The best part about this is, all of these feeling of mine are going to change next year, this stuff will kepe improving and who knows where we'll be.

@Charlotte Joanne I'd personally pay for it even if you don't use it, why? Because this dev really does care about feedback and they understand along side us that this is a new technology and they really are trying to please people, even people like me who can be quite negative at times.

I've been being a bit negative in my last posts but don't get me wrong, this app is amazing, I just can't see me using it out and about at the moment.

I really do want a donate button in the app, i'd honestly donate a lot to it even if i don't use it, the more money the dev has, the better their app can be.

@the dev

I can't seam to play the audio atached to my mail when sharing with piccybot, is that a bug or just a me thing?

I'm using vlc to play it if that helps.

The AI moddles.

I think that a little description for each moddle would be nice, why would I want to use moddle x over y?

If that's already in the question mark thing on the top left, at least that's how it is for a blind user, great! I just thought i'd throw it out there.

The more I listen to AIs right, the more there's a pattern, at the moment I'd not say they right like a human does, I really do wonder where we'll be with that this time next year.

Oh my goodness Brad, that is fantastic isn't it!

As I said, I don't use the app. That description was totally amazing!

It was!

Now, it's not for me, but once they manage to sink the description with the audio/visuals of the video, if that's possible, then it will be great and if they make an addon for firefox, I'll pay for it straight away.

If possible I'd like more voices but that can be added later if at all.

Please focus on the video side of things and syncing audio and video together if you can.

@brad

I love the amount of description you get, I think it's great. I want a picture painted in my head and the more info it gives, the better IMO. I was also born blind so I think this is a very subjective thing.
As for an audio description like experience, I think that'd be crazy difficult to immplilment. You'd have to have it understand the contexts of the video to put description in the right places somehow and that's going to require a lot of work on Martijn's end, if it's even possible at all.
Love the idea of a donate button on the app as well.

@Brad So what do you use the app for?

Thank you, but what do you use the app for then if not descriptions? Maybe I'm thick here, but I'm not getting what the benefit of this app is? Just trying to understand.

Subjectivity.

I guess the amount of details that one prefers in a description is entirely a subjective thing; I am born blind, and prefer rich, detailed descriptions with a detailing of the colors, tone etc almost 70% of the times. It's only when dealing with work-related data or something that I prefer a concise approach. But maybe a setting to determine what kind of description one needs might be nice.

Born blind here as well

I was born blind and I am loving the detailed descriptions this and every app gives. I think it is a person to person preference. I am going to test this app on youtube on some shorts, this is a great app!

Oh it totally is a preference thing.

@gaybearUK, I don't really use the app.

I try it, find a feature to be neet and then delete it.

@inforover oh it would be really hard to do,, I don't think we're there yet, but I do think we'll get there one day and it won't be far off.

Sharing from dropbox broken?

After updating to the latest version, now when I try to share a photo from dropbox, no matter via "share" or "export" piccy shows "fetching data" and then "please wait" and then sits there forever doing nothing like it's frozen.

Add Indian regional languages for recognising images

Hello developer, this is Kaushik from India. Your apps. Accuracy is very good, but we need this app to recognise Indian regional languages so that we can use seamlessly and try to bring in the feature to read PDFs and other book formats with the best affordable rate for everybody now recently in India, iPhone purchasing has been increased by our visually impaired community. Do consider this. Thank you.

Dropbox and regional languages

Kaushik, at the moment PiccyBot supports the Indian languages Hindi, Bengali, Gujarati, Haryanvi, Marathi, Punjabi and Sindhi. Will add further languages in due time, when usage from those regions goes up.
Privateai, I released an update today that should fix the dropbox and whatsapp sharing issue. Hope it all works ok again!

Regional languages

@Martijn. do consider adding a couple of South Indian languages if you are at it because I know for a fact that there are a reasonable number of users from these parts to make the effort worthwhile.

Sharing images from email

Sorry if I've missed this, but is sharing an image from email broken?

I'm using ios 18.0.1 and I share to the pixies and it makes the waiting noise but never seems to get past it. I ended up using Be My AI which worked OK but the descriptions weren't as verbose as I get with this app.

(By the way, I hope the dev doesn't find it annoying that I always refer to this app as the pixies. By the time I realised it wasn't called Pixie Bot it was already cemented in my brain as that. And I kinda like that it has a pet name. And given my username I can say it's definitely not meant as demeaning or anything like that.)

It's like having your very own Dev Shop

The way this app is improved, almost on request, is so very impressive!

It really is.

If we get that donate button i'll use it.

I can't imagine where this will be this time next year, or the amazing stuf to come out by then, I can't wait!

re: Dropbox and regional languages

Thanks for fixing it :) I store most of my photos in dropbox so I was sad not being able to use it :)

Thoughts

I want to start by saying I really love this app and the fact that it can describe videos. It’s an incredible feature, and I find it super useful. However, I do have some feedback that I think could make it even better.

Right now, the limitation is that the app can only describe 60 seconds of video, which is about one minute. I understand the challenges behind processing videos for descriptions, especially when the app needs to download and handle the video on the device. However, I wonder if there could be a way to work around this. For example, what if we could watch videos directly from platforms like YouTube, and somehow screen-share or sync it with the app to receive real-time descriptions?

As a blind person, I really appreciate being told what’s happening in a video, but it’s hard to know how frequently scenes are changing or what exactly is going on during more dynamic content. It would be great to have a way to know the timestamps for when events happen and how often things change from one scene to the next.

Another issue I’ve come across is that, to my knowledge, we currently can’t mute a video on YouTube and still have it described. I think it would be incredibly helpful if we could mute the original audio on videos, particularly for things like music videos, and have the app provide the description instead. This way, I could choose to listen to the video’s audio when I want, but also have the option to mute it and have the app describe the visual content for me.

I hope this feedback is helpful and that it’s something that can be looked into in the future. Thanks so much for all the hard work that’s gone into this app—I really appreciate it.

Email pt 2

I tried opening up the same image from my email later on and it worked fine, so must have been just a temporary glitch. Sorry, was a bit trigger happy with my post yesterday.

Increased video duration and Reddit support

Thanks for the feedback guys! Winter Roses, I will increase the length of the video that can be processed. Already did it for the Android version, the next iOS release will have at least double the duration for pro users.
I have also added support for the sharing of Reddit videos, there were quite a few requests for that. If any of you have any suggestions for more specific video sources that will be helpful to describe, let me know..

Text length in chat

Not sure if it's set this way, but when I use the ask more feature, the responses are rather short. I prefer this app's long and detailed descriptions, and would like it if we can get similar length on chat- according to our setting preferences. Also, sometimes on long descriptions the text cuts out before getting to the end. Is there a way to put in a "continue" button or something to prompt it to finish from where it left off? From experience using chat AIs I know often all you have to do is type "continue" into the prompt, but when I did that pixxy simply re-analyse the photo and generated a new description rather than continuing the previous thought.

I had a halarious moment…

I had a halarious moment with this app. I took a selfee and I do have a lot of skin tags and it said "That man must be in a lot of pain and discomfort with those skin liesions lol.

Bug: severe truncation on all answers in the chat interface

No matter what I ask in the chat interface about an image or video, all answers get severely truncated. This seems model-independent as it is the same with Claude 3.5 Sonnet on images and also for videos which use another model. Answers are truncated to nearly the same length for images (about 40-50 chars), and somewhat to a longer length for videos, but for the latter case truncation is also very severe.
Instructing to continue doesn't help at all. In that case the initial description is reiterated, but also truncated severely. So I don't think at all that truncation occurs at the model level, but instead it happens somewhere between the model and the displaying of the answer. What I get as an answer shows that it would be completely coherent and appropriate hadn't it been truncated badly.
I set the length parameter in Settings to 100 %. As I am a lifetime subscriber I have access to that screen and I could adjust that.
I have the latest version (2.4). I use piccybot in Hungarian, however I seriously doubt this has any significance regarding thhis truncation phenomenon.
Unfortunately this bug renders the chat interface (invoked through the "Ask more" button at the bottom of the main screen) practically unusable. Otherwise I love the app very much!!!

Updated app with chat interface fix

PrivateAI, Laszio, thanks for the feedback! An app update is available that should fix the chat responses. They should be medium length and take into account the information already given. Hope this works ok, let me know what you think?

re: Updated app with chat interface fix

Work's great so far! Thanks for the fix! I want to take this chance to request if we can adjust the volume of the AI in the APP itself. Currently, my default voiceover is way louder than the AI, so when I have my normal volume, I can hear my voiceover but can't hear the AI, then I turn volume way up on my phone, now I can hear AI but everything else is way too loud LOL.

Voiceover volume

If you add Voiceover volume to the rotor settings you can change the volume of voiceover relative to the overall volume on the phone. That also works on the Apple watch.

--Pete

2.5 update

I got the 2.5 update with the chat interface fix. Although I have it since only about an hour, I managed to test it in English and Hungarian, both with Mistral Pixtral and the video description model. I can report that I am satisfied, I've seen no truncation in the chat answers, they seem to comethrough fully and yes, they are to the point. I hope it stays so, and thanks for the quick fix!!!
I've experienced only one strange thing with 2.5 update. There is that edit box one or two right flicks away from the top of the main screen. I call it the prompt edit box, as it contains the instruction that mainly guides the image/video description process, and so that classifies as a prompt in AI terminology. Before this 2.5 update, if piccybot was set to Hungarian, by default the prompt edit box contained "Mi van ezen a képen?" (what's in this picture?) for images and "Mi van ebben a videóban?" (what's in this video?) for videos. Now by default this prompt edit box seems to read "Kérdezz a piccybottól a kérdéseddel" (literally ask Piccybot with your question - a sentence that definitely sounds clumsy in Hungarian), which is not appropriate for a Hungarian prompt text. Nevertheless descriptions seem to work okay this way too - so far. Piccybot is soooo versatile really that I simply haven't gone through all the combinations I use this app on with the new update yet: pictures and videos taken on the fly, pictures and videos from my gallery and also shared from other apps, like mail. So I don't know yet whether this prompt text thing is really a bug or not. Time will tell.
One more thing about Hungarian. Though every part of Piccybot seems to support it quite well, Hungarian cannot be selected from the supported languages list from the settings screen, because it is not listed in the dozens of languages there, nor can it be found with the search edit box on that screen. So I access it with the "phone system language" setting, and it works this way. This is only a very minor nuisance mostly, that can easily be fixed in one of the future versions with other bugfixes.
All in all with the chat interface fix seemingly in place, Piccybot is really a bright gem in my"vision toolbox" on my phone: many models, many languages, extremely diverse possibilities. So thanks much!!!

Video descriptions stopped working

Late night on this Tuesday (29 October) video descriptions stopped working abruptly and haven't returned to life since then. After the waiting sound "server error" is displayed where the description should appear. This is independent of language: I tried with several languages and the result is the same. I suspect an API change at the side of the video description model. I ruled out other regular causes of such a disruption: net etc. are all fine.
By the way I noticed accidentally that now Piccybot lets me record a video over one minute (I am a lifetime subscriber). Thanks for that much!

Video disruption

Laszio, there was indeed a server issue on Tuesday, but it should be working ok now. Can you try restarting phone and PiccyBot and try again?

I will be adding backup services for these situations when one provider goes offline.

Success!

Thanks much! Closing Piccybot from app switcher and then starting it again was enough to get it working again. I was quite sure that I had tried that simple remedy before, but it in fact turns out I haven't.
By the way after the app restart video descriptions come through in a drastically different style than before the server error. I know well that each generated text has a bit different style and characteristics, but this time the difference is much more pronounced. The video description is more compact, has a more straightforward style with less details, and I experience much more hallucinations than before and they are quite radical ones indeed. I haven't changed anything in the settings.
Have you somehow changed which model does the video descriptions or what may be going on?

Video update

Laszio, thanks for confirming that this is a workaround for now. I have not made any changes in the setup from my end but on the side of the models things seem to have changed. Working on that the coming days to get it back to a fully stable and reliable setup. My focus has been on getting the realtime voice to work in PiccyBot, but this gets priority now.

Real-time voice

That sounds interesting. I'm not much for talking with tech but in this case I might give it a shot when it is ready.
There are also some new models out now that might be of interest, both a new version of Claude 3.5 Sonet and a model called Molmo that is said to be quite good with images (in addition to llama 3.2 and chatgpt-latest which I mentioned before and that might be implemented already, was a while since I checked).

New AI app for describing images and video: PiccyBot

Options

Comments