Those of you who have watched the live stream, do come over and let's talk about how incredible the demo seemed. And also about how much of it would translate into actual usable stuff. And also what this means for accessibility and Assistive AI if I may. And those of you who haven't, do go watch the stream. it's incredible! (in all caps)
Comments
mac app
From what I understand, you will be presented with a popup on the main site when you get access.
I'm not really understanding why this is a gradual rollout since it wouldn't be putting any additional strain on servers or anything.
I'll also update the thread as things develop.
GPT-4O: my observations
Hi all,
This post will probably be a long one and maybe a bit rambling. First, I can't wait for the video feature.
All that expression is awesome, but I imagine you can tell it to tone it down a bit like the guy was doing in the video with the lullaby when he was telling it to sing in different ways. I've been messing about with GPT-4O for the last week. You can get it to browse the internet for sources. Every time I ask it something now, it either gives me sources automatically, and sometimes I ask it to look something up on the internet just to be sure it's using up-to-date material, and it still provides me with sources. It always provides links to sources, these links aren't by default accessible with VoiceOver on iOS. I have to tell it to make the link a clickable element so I can activate it with VoiceOver. More often than not, I have to ask it several times. When I use GPT-4O on my Windows computer, the links are displayed as links no problem. It won't read an entire webpage to me, it just insists on summarising. Even though I can take a screenshot of a page and it seems to read all the visible text including the ingredients for a recipe. Now to editing images. I took a screenshot of a word document on my computer, and sent it to GPT. I told it to get rid of all the icons at the top, toolbars and desktop stuff etc., and just leave the text in the Word document visible. It did get rid of all the icons and additional stuff you would usually want to get rid of if you were editing a screenshot, but it altered the font and the words ended up looking distorted, and there were spelling mistakes because it added extra letters and deleted stuff too. So all in all not a success. Oh and, it provides you a link so you can download the edited image. Sometimes this link works, but sometimes it forgets to upload the file, and when you click on the link it tells you 'file not found'. When getting it to browse the web, I asked it to look for something on Amazon, it said it couldn't. When I asked it why, it said that Amazon has tools to stop automation and scraping. I don't know how true that is. I'm not sure if it's good at finding exactly what I want from a specific website. I asked it to find the latest recipes from a site I like, and it didn't find the latest recipes. It seemed to go a few pages deep for most of them, instead of just looking at the homepage. I can't wait to try and use the video feature when I'm out and about. Maybe I can get a lanyard. I don't know how comfortable that would be. I saw a video on YouTube where somebody put two phones with the video capabilities together and the two GPTs were talking to each other and singing.
Check this out.
https://www.youtube.com/watch?v=MirzFk_DSiI
I can't wait to get my iPhone and iPad talking.
Could try that
Hi Olly,
Even when you can eventually click on it, it doesn't actually display as a link which is viewable with the rotor. It's not a link at all. You just have to double tap on the text which says 'view recipe from BBC good food' or something like that. So you have to navigate by line to find the text, or even by word sometimes. But maybe I could try customising it too, at least it might work consistently even if it's not a proper link. I asked it how it was coding the link, and it told me it was using the 'a href' tag which is correct to mark a link as a link, but whether it was actually doing this, who knows?
Soon, they say.
Okay, my chatgpt app said that the new features of 4o will be roled out to me soon when I randomly opened it yesterday. I wonder, how soon is soon?
Humane already has it onβ¦
Humane already has it on their AI Pins, and it has improved their speed.
They won't stop bragging about it on Twitter.
It's a shame that the product isn't accessible, all because they insist on that laser ink display.
Time to see what Apple has in store for us. I doubt it'll be as impressive as this.
They haven't exactly had the wow factor for a while now, not since they introduced the M1 chip.
Vision Pro did sound impressive, but not if you're totally blind.
Humane
Humane is an absolutely terrible product lol. Seem the sht show on YouTube, x, fb, read articles and spoke to a couple people who had it. Yes I said had. Itβs a terrible product. Better do your research.
Social Media vs The World
Fun fact: Social media is not good research material. That is about as bad as using Wikipedia as a "credible" source when citing other people's written and/or spoken works. There is (sadly) an old adage, or expression if you will, that goes, "If it's on Facebook, it must be true". That can be said for every social media platform ever developed. Ever.
Social media has 2 fundamental purposes; first, to allow any body and every body to voice their own, and often misguided, opinion about any given topic. Second, to target the part of the human brain that controls addiction.
I said before, that everyone should be patient for this technology to properly develop so that we, as consumers, can have a wonderful experience with said new technology. Yet, all I am seeing here and elsewhere is people bickering that "they" do not have their automagical AI companion yet.
How about, ya know instead of demanding instant gratification, every body just take a step away from their keyboard for 5 minutes and just. . .
Breathe. ππππ
Ollie
I'd call it the sorting hat.
OpenAI and accessibility
I would not bet on OpenAI thinking about accessibility. While it is true that they have highlighted the benefits for blilnd people, which is great, the web interface for ChatGPT does not have labeled buttons after being available for one and a half year. That is such a simple thing to fix, but it has not been done (I have reported it multiple times and so, I believe, have others).
Sorting Hats & Automagical Wands
So, we have the sorting hat, and perhaps one day a viable haptic pen. Now if we can just get the fine folks at "Glidance" to redesign their model to look more. . . broom-like, we may have a winner winner, chicken dinner !! π
Open ai and accessibility.
It's quite broken, I tried signing up to their website and kept getting told, on Edge, that my password doesn't meat their requirements, on Firefox, I got nothing.
Turns out I was using my old email address that i'd used in the past to sign up to them and then delete the account after realising I'd not use this survice as much as I thought I would.
Turns out that once you delete your account; your email still stays around, I think this is wierd and makes no sense to me what so ever.
I've emailed them about this and got a response saying how they're commited to accessibility, I don't believe them but we'll see, and how i can't use that email and that they might be able to do something about that in the future, I don't believe they will but we'll see.
I've asked them why emails aren't deleted and haven't gotten anything back yet, I did mention I live in the UK, I think there might be european laws that might help me, but I'm not sure.
The not deleteing your email thing honestly really bothers me, I was under the impression that once you remove your account that everything was removed.
Siri does magic
you can already do spells with your iOS device.
Just say, "Lumos," or, "Knox" to turn the light on and off.
Must be one of those silly little quirks Apple put in to make Siri less boring.
Re: Be My Eyes on WIndows
As far as I know OpenAI will release a Windows app "later this year". Will be interesting to see what functionality that and the Be My Eyes apps will offer.
Re: Hey meta
@lottie you could actually make all the HP spells into a set of prompts that would accomplish different things, as wierd as it would be.
Join Be My Eyes Beta?
Any idea how to join Be My Eyes Beta on IOS?
Just for this GPT 4O, I am willing to have beta version.
in genthl, how can test
hi all, Not just for chat Gpt but can we join the be my eyes beta if we are serious about helping out, with testflight?
virtual photographer?
Imagine if it could help take photos as well?
ex. I tell it I'd like to take a selfie.
It directs me on how to move the camera, counts down, snaps the photo then stores it in the app's photo gallery similar to Aira.
Something like this would be amazing, especially for those of us who struggle getting good shots.
I also love nature, so something like this could be immensely helpful.
even though I have never had usable vision, I do love taking and saving pictures.
Like, Google Frame, only more advanced?
Having an AI camera assistant would be very cool. Right now, at least for iPhone users, the closest thing we have is to FaceTime video call someone, and let them take a screenshot of whatever we are pointing our camera at.
Would definitely give a sense of independence and an opportunity to explore a (not so) new concept; blind photography. π
When will this be out?
Omg. I nearly canβt wait any longer! How many weeks until this is available? Will it be available on ChatGPT? Or Be My Eyes? Or both? I am a paid subscriber to ChatGPT.
@Brian: Photography
AI could probably help some to get a good shot. I think the zooming would be helped a great deal, and cropping would be made possible for blind people, where it is not currently.
The iPhone does have the leveling haptics and tilt instructions that help. It also tells you if and where a face is in the viewer. You can use your own face and a tripod to figure out where the edges of the picture will be and so on, and so on... But it's an exhausting pain to go through all that. It would be nice to take a burst of wide-angle pictures and work with the AI to crop and enlarge the subject of interest into a reasonably good photo.
I still think it would be useful to cross-check multiple AI descriptions from different apps to get a better idea of the actual photo. There was a video of two of them talking to each other somewhere on the site or this thread, but I think they were the same app.
Re: photography
With 4o's multimodel capabilities, we should already be able to do this. It should be as easy as giving a prompt instructing it to help you capture a good snap of whatever you want captured, including people. Also, the more detailed your prompt, I'm guessing the better will be the picture captured. But the ability to save this picture into the gallery might not be present with the gpt app as noone in there is likely to have thought of such a use-case. Maybe we'll find a way around that. I'm also excited for getting onboard the photography bandwagon.
photography
Just as long as there was some way of storing the image to be shared later.
I think the process of getting it to take the photo would be easy enough, it's just the matter of somehow saving it.
I think this would be more of a niche feature, at least in the beginning.
Quinton
It already lets generated images to be downloaded/shared with other apps. I wonder if it'll do the same thing with captured images if prompted correctly? once the camera functionality roles out that is. If that can be done, will it also create the image with a small description attached as metadata, again on being prompted properly? In other words, is it a magic wand?
Slight diversion
We've been talking about use-cases where this will impact our daily lives; we've discussed how this will be a fun toy, snapping pictures and such; but what do y'all think will be the impact of this in Visually impaired being employed? Will it make us more compitant in professional settings? especially in settings where one is required to deal with visual information?
editing photos
Hi,
I asked it to edit a screenshot I created. The edit didn't go well at all, I said this above I think, but it did give me a finished result as a link to download from. The link in the iOS app wasn't accessible by default with VoiceOver, I had to coax it a bit to get it to work. The links it generates come out much better in a browser where they're displayed like normal links. I think taking a photo and accessing it for later use will be doable.
AI in professional settings
While there may be situations where the carefully-considered, supervised use of AI can be beneficial for accessibility in a professional setting, AI alone is not, in my opinion, up to the task of meaningfully addressing unemployment among blind and low vision people for several reasons.
One is that AI models can and do hallucinate, meaning that at least for me, I am not comfortable making professional decisions based on AI-generated descriptions of required visual information that I otherwise can't access and verify. In addition, I would not want prospective employers to get the impression that AI tools are a replacement for presenting information in a natively accessible manner for the reason mentioned above. In fact, I believe that if such attitudes took hold among prospective employers, colleagues, and clients of blind and low vision professionals, it could potentially have the opposite effect of increasing access to employment and further exacerbate existing disparities. I know that I wouldn't want people's impressions of my qualifications to be tainted by their feelings toward AI or the limitations of such technologies.
Finally, the sharing of confidential data with an AI model, even if the developer's privacy policy states that it is not retained, could be potentially problematic in certain industries, like education, healthcare, and law.
Re: Release pushed back and AI in professional settings
Ollie, what source do you have on that?
Tyler, I agree with you on this one. Ther might be a few situations where AI could be useful, but generally speaking I don't see it having an impact, at least not in the short term. When AI becomes more of an integrated part of business applications and companies find safe ways to utilize it, then it might be different.
Patience vs Perseverance
Hey Ollie,
I've been saying the same thing for the past several days, though maybe not so bluntly. As for productivity in an educational/professional environment, I am all for it. Yet, I do not believe for one minute that AI is quite there. That does not mean I have any concern that it will never reach that milestone, rather I think everyone needs to realize, and understand, a fundamental fact; this is all essentially new technology.
We all want this to be a huge success, and we all have our reasons/needs/desires as to why, but consider that this may not reach any of your goals for months, perhaps even years from now. Everything we have read or listened to concerning AI and LLMs is, for lack of a better description, glorified media hype; designed to get everyone excited about a "what if" scenario.
So, praise and plead, insist and demand, and discuss and promote "your reasons" why AI will be all that it can be.
...but stop demanding a release date on something that is no where near ready to make all of your hopes and dreams come true.
Glorified media hype-pothisis
I have a bit of a disagreement there. Let me just illustrate it with a use-case. Some 3 months back, if you gave me a worksheet worth 500 rows of data and a tonne of graphs/charts/pivot tables on it, as a blind person, in most cases I wouldn't have been able to make much sense of it. If I were rather brilliand with numbers, and if I had a thing for visual representations of numbers/data etc, I would have had to struggle through the process, taking 10 times the time taken by my sighted counterpart. Today, I could input the same data in any of its forms into gpt 4o or claude opus and get any kind of analysis/insite in almost the same time as my above-mentioned sighted counterpart. I'm only limited by the kind of questions I can think of asking. What is more, with 4o, I can turn the same worksheet data into any visual representation I can imagine such as graphs/charts/whatever with quite a high degree of fidelity. I could then use the same representations to make compelling PPTs, again using AI, again with a reasonable degree of fidelity. The maximum I'd need to do is to ask someone sighted to take a look because of the "trust but verify" principle. And all this, I can do in almost the same time as my sighted counterpart employing the same technology. Did I think that I had the potential to be this productive 3 months ago? no. I'm looking forward to a time when AI can get the map input data and represent it to me in a form other than audio. And it ain't that difficult.
@Brian
I'm not sure AI will make any of my hopes and dreams come true, that usually involves unlimited wealth and magical powers...
One thing I wonder is how the chatting with AI will affect how people interact with actual people. I suspect you can be extremely rude and verbally cruel to the AI and it will continue to be happy-sounding and cheerful. I know some people who are already on the edge of being that way to actual people. Will the AI give them a swear jar to dump their abuse, or will it end up just getting them used to ripping up the frail, social contract that keeps us all from going at each other's throats?
That's great, Gokul
Do (you) have access to Chat gpt 4o? If I were to present (you) with said example of a chart/graph/etc, could (you), with accuracy, give me the relevant information I may happen to need, at any given time?
Can (you) do this now?
Somehow, you have managed to both make my point, and yet miss it entirely. π€·
I said this already, that we all (myself included) have ideas of what (we) will do with the technology once it is widely distributed, but for now all (we) can do is debate the (im)possibilities based on conjecture and content.
My point was not about never achieving independence and success in our daily and/or professional lives. My point was, as I said before, to take a step back and just, breathe.
This technology is coming. This is a fact based on everything that has so far been reported/announced. However, impatiently posting demands to have access to 4o right now, is not going to make it actually get into your hands any faster. This is also a fact.
Please consider that.
how select?
I wonder how will they select alpha testers? surely blind people's needs will be more specific and targeted for the visual elements than sighted people?
Will
They aren't going to be able to differentiate between blind and sighted people.
What I'm wanting to find out is when be my eyes gets access to the API.
I'm excited to watch this develop and improve while I wait for access since there will most certainly be videos about this as it slowly begins to roll out.
Brian
In answer to your questions, yes, yes and yes. Let me explain myself: as a paid user of gpt_+, I have access to some of the features of 4o, which includes the improved data analysis/interpretation capabilities. even before this, with the gpt3.5, you could do a lot of this, but with 4o, it's become much more accurate, capable and stable, to the degree wherein I can confidently say I'm "number-indipendent" in my workplace so to speak( do note that I deal with charts, graphs, dashboards etc on a daily-basis).
I can do it now or anytime because I have access to my account both through my PC and through my phone. what's better, I can click an image of any data(including drawings) in the phone and share it with the app.
Well, I wouldn't say I could give you hundred% accurate analysis if you were to present me with a chart/graph/worksheet, because it'd be arrogant of the best data analyst out there to make such a claim under perfect conditions, and I am nowhere near any of that. Like I said, I am always limited by the kind of questions I can ask and the kinds of things I can imagine. But having said that, yes, I can make an attempt within reasonable human limits.
months away i guess
if the product with visuals isn't out for months so let's say the autumn or end of this year i doubt we'll see it in be my eyes until fully released but it seems nobody but them knows the answer to that question, as to when we can try it, but i doubt before the gpt app itself, to be honest. bit of a let-down though, showing that off, and in my view showing it as a ready to test product only to hear wait 3 6 or 9 plus months to be fair so by then people may just see it as oh look, the image areas are here, and forget the hype. we won't know how good all this is until real-world testing and they are constantly refining the model. obviously, through user testing it will get better each passing day, but i doubt we can even try it on external apps, in my view, until it actually is released, by Chat GPT themselves.
I'm not bothered.
They had to take down a voice and might be being sued, they're good but arrogants will be anyones downfall.
Re: months away i guess
Exactly.
iPhone 4o
I imagine a world with iOS 18 and Siri 4o.
In some famous celebrity's voice, of course. Can't wait to check out the "new" voices in iOS 18. π
Blah blah
Open AI is going to be fine. The drama between Scarlett Johansson and open AI is pointless. Iβve done the match side-by-side and they didnβt use her voice, just someone else who sounded similar.
Not saviors, no
They're just a big tech company that has seen a growth from almost unknown to one of the biggest players in a really short timespan, of course therer will be setbacks and bad decisions along the way, like for all other companies (the whole Sam Altman circus in november and now the Sky voice situation for example). I don't see them as better or worse than many others in that regard. Ollie, what do you mean by "the intentional misdirection of what is coming and when"? Feels like I have missed something there.
Regardless of how I feel about the company, the fact is, that they currently have the best technology for AI description of images. As the other big players (Google, Meta, Anthropic, maybe Microsoft too) release better models that advantage might disappear, but for now, if we want good image descriptions, OpenAI's models are the ones to go to (seeing the performance of Gemini and Claude in JAWS just reinforces that sentiment). Personally I hope that more models will reach the quality of OpenAI, the more alternatives we have the better, and it is not a good thing being "locked" to just one.
if is sj
if the voice is SJ so what least she in the lime light in any case it will be based off a voice of some sort
Big tech company
Exactly! the point that we most often miss in such discussions is the fact that AI, LLM, whatever is just another tool. And whoever makes it doesn't do so out of any altruistic intentions. They need to be conscious of the economics to stay afloat. Expecting that they will, on their own, adhere to only ethical practices is just hoping for a false paradise. Therefore it becomes the conscious duty of the end-user to be aware of that fact, at the same time deriving the maximum possible benefit out of whatever is available out there.
An app that guides you to take decent photos
The app lets not only the visually-impaired but also sighted users to take good selfies using the rear camera with the screen facing away from the user so that the captured photos are of higher resolution. It detects faces and various other objects, and provides spoken prompts to position the device or whatever is (intended to be) captured. Here's the link:
https://apps.apple.com/tr/app/eyesense/id1353368137
Note that the app is maintained by a company based in Turkiye, and does not appear to support any languages other than Turkish as of yet, though contacting the developer for better localization/language support is always an option. Thing is, you can just have the sentences translated or memorize them as they are, as there are already a few of them.
Re: Ollie
Yes, I can kind of see why people think that. For me it was pretty clear that what they showed was going to take some time to get out, with all this talk of "alpha testers" and so on.
No, they are not a nonprofit, rather a "cap-profit", organization as I understand it (they however started as a nonprofit), so while their company structure is not like other big tech, they more or less operate under the same conditions (and this is one of the criticisms they have faced, that they have become focused on the commercial side which was not the intent when they started).
I turn my back for two weeks...
And all this happens - bit of a whirlwind it seems.
The demos are incredibly exciting. When away I was trying to find a good way to get a sense of where I was. The Meta Ray-bans were OK at feeding me piece meal details, and Piccy Bot was good at giving me a lot of detail after the event, but it was all a long, long way from the ducks video and having something actually appreciate what was in front of me.
The one thing I have very mixed feelings about is the personality. I'm not sure I've ever felt the need to become best friends with my computer before. I guess I will enjoy using it but I wonder what the social impact is of being able to basically create your own perfect personality who will happily flirt with you at every opportunity, laugh awkwardly at your unfunny jokes and generally just act like you are the best person in the world. I wonder what impact this will have on relations with actual people. A little like if everything you learned about the opposite sex was by watching pornography, it's not going to end up with very healthy relationships.
I was really interested to hear it interpret graphs. This is something I struggle with at work - I just ask a sighted colleague to help me see what is there. I guess I can probably do this already but it was pretty amazing to hear it working like that.
I was listening to a verge podcast on this last night and they were saying that they thought the AI wasn't so much getting smarter as getting more convincing. It has a certain Weird Science feeling about it. Nerdy computer programmers creating their own perfect woman in a lab. Also interesting to hear their take on google's move to AI and its move away from providing you access to other people\s information to basically telling you what to think.
Anyway I will definitely give it a go when it is available for free. If it was usable from my Meta Ray-bans then I would be subscribing already.
Don't talk to me like I'm your AI...
@mr grieves, As I posted before, I'm more concerned with the so called AI personality taking certain people's foulness with a seemingly happy attitude, followed by those certain people getting used to treating actual people, such as me, in the same way they interact with the AI. We haven't seen any videos showing how the AI responds to... a manipulative sociopath, for example, or how the AI responds to constantly being told it is useless/worthless/stupid/under-programmed.
I remember back when I first started using SIRI,, I kept thanking it after its response without thinking about what I was doing. Then someone told me that was silly, so I stopped. It did say, you're welcome though.
cool demo of voice en vision functions of ChatGPT 4O
Hello,
Just found this video on youtube:
https://www.youtube.com/watch?v=VnHrr1v0GEM
That's Ember
That voice is already in the Chat GPT app for iOS, the same as Sky was. But of course in the video, Ember is a lot more expressive. I like that voice actually. I imagine they won't bring Sky back though. I like the way it told the user which route to take on the underground.
Now this, on a smart glass...
Could literally be the game-changer in terms of accessibility. Yes, I know, it's dangerous to depend on AI while travelling;you could lose your internet anytime; but think of navigating safe indoor spaces like airports, railway stations, shopping malls...
Anyone heard Sky reading long texts?
Oh, I wish I could get her to read all my books. It's the ultimate in TTS I've always dreamed of.