A surprising number of the entries for AI are about generative models that don’t generate text or artwork—specifically, they generate human voices or music. Is voice the next frontier for AI? Google’s AudioPaLM, which unites speech recognition, speech synthesis, and language modeling, may show the direction in which AI is heading. There’s also increasing concern about the consequences of training AI on data that was generated by AI. With less input from real humans, does “model collapse” lead to output that is mediocre at best?
- RoboCat is an AI model for controlling robots that learns how to learn. Unlike most robotics, which are designed to perform a small number of tasks, RoboCat can learn new tasks after it is deployed, and the learning process speeds up as it learns more tasks.
- AudioPaLM is a new language model from Google that combines speech generation, speech understanding, and natural language processing. It’s a large language model that understands and produces voice.
- Voicemod is a tool for turning human speech into AI-generated speech in real time. The company offers a number of “sonic avatars” that can be further customized.
- Tree-of-thought prompting expands on chain-of-thought by causing language models to consider multiple reasoning paths in the process of generating an output.
- Facebook/Meta has built a new generative speech model called Voicebox that they claim surpasses the performance of other models. They have not released an open source version. The paper describes some ways to distinguish generated speech from human speech.
- MIT Technology Review provides a good summary of key points in the EU’s draft proposal for regulating AI. It will probably take at least two years for this proposal to move through legislative channels.
- OpenLLM provides support for running a number of open source large language models in production. It includes the ability to integrate with tools like Bento; support for langchain is promised soon.
- Infinigen is a photorealistic natural-world 3D scene generator. It is designed to generate synthetic training data for AI systems. It currently generates terrains, plants, animals, and natural phenomena like weather; built objects may be added later.
- Facebook/Meta has created a new large model called I-JEPA (Image Joint Embedding Predictive Architecture). It claims to be more efficient than other models, and to work by building a higher-level model of the world, as humans do. It is a first step towards implementing Yann Lecun’s ideas about next-generation artificial intelligence.
- MusicGen is a new generative model for music from Facebook/Meta. It sounds somewhat more convincing than other music models, but it’s not clear that it can do more than reassemble musical cliches.
- OpenAI has added a “function calling” API. The API allows an application to describe functions to the model. If GPT needs to call one of those functions, it returns a JSON object describing the function call. The application can call the function and return the result to the model.
- A study claims that AWS Mechanical Turk workers are using AI to do their work. Mechanical Turk is often used to generate or label training data for AI systems. What impact will the use of AI to generate training data have on future generations of AI?
- What happens when generative AI systems are trained on data that they’ve produced? When Copilot is trained on code generated by Copilot, or GPT-4 on web content generated by GPT-4? Model collapse: the “long tails” of the distribution disappear, and the quality of the output suffers.
- FrugalGPT is an idea for reducing the cost of using large language models like GPT-4. The authors propose using pipeline of language models (GPT-J, GPT-3, and GPT-4), refining the prompt at each stage so that most of the processing is done by free or inexpensive models.
- Deep Mind’s AlphaDev has used AI to speed up sorting algorithms. Their software worked at the assembly language level; when they were done, they converted the code to back to C++ and submitted it to the LLVM project, which has included it in the C++ standard library.
- An artist has used Stable Diffusion to create functional QR codes that are also works of art and posted them on Reddit.
- The movement to regulate AI needs to learn from nuclear non-proliferation, where the key element isn’t hypothetical harms (we all know what bombs can do), but traceability and transparency. Model Cards and Datasheets for Datasets are a good start.
- Sam Altman talks about ChatGPT’s plans, saying that it’s currently compute-bound and needs more GPUs. This bottleneck is delaying features like custom fine-tuning the model, expanding the context window, and multimodality (i.e., images).
- Facebook/Meta’s LIMA is a 65B parameter language model that’s based on LLaMa, but was fine-tuned on only 1,000 carefully chosen prompts and responses, without the use of RLHF (reinforcement learning with human feedback).
- Some things have to happen. Gandalf is a prompt injection game; your task is to get an AI to reveal its password.
- Leptos is a new open source, full-stack, fully typed web framework for Rust. (How many days is it since the last Web Platform?)
- In the not-too-distant future, WebAssembly may replace containers; software deployed as WebAssembly is portable and much smaller.
- Adam Jacob talks about revitalizing DevOps with a new generation of tooling that uses insights from multiplayer games and digital twins.
- Wing is a new programming language with high-level abstractions for the cloud. The claim is that these abstractions will make it easier for AI code generation to write cloud-native programs.
- Simpleaichat is a Python package that simplifies writing programs that use GPT 3.5 or GPT 4.
- StarCoder and StarCoderBase form an open source language model for writing software (similar to Codex). It was trained on “a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process.”
- How do you measure developer experience? Metrics tend to be technical, ignoring personal issues like developer satisfaction, the friction they encounter day-to-day, and other aspects of lived experience.
- OpenChat is an open source chat console that is designed to connect to a large language model (currently GPT-*). It allows anyone to create their own customized chat bot. It supports unlimited memory (using PineconeDB), and plans to add support for other language models.
- WebAssembly promises to improve runtime performance and latency on both the browser and the back end. It also promises to allow developers to create packages that run in any environment: Kubernetes clusters, edge devices, etc. But this capability is still a work in progress.
- People have started talking about software defined cars. This is an opportunity to rethink security from the ground up—or to create a much bigger attack surface.
- LQML is a programming language designed for prompting language models. It’s an early example of a formal informal language for communicating with AI systems.
- Memory Spy is a web application that runs simple C programs and shows you how variables are represented in memory. Even if you aren’t a C programmer, you will learn a lot about how software works. Memory Spy was created by Julia Evans, @b0rk. Julia’s latest zine about how computers represent integer and floating point numbers is also well worth reading.
Augmented and Virtual Reality
- David Pogue’s review of Apple Vision, the $3500 AR headset: Limited in a way that’s reminiscent of the first iPhone—“But no headset, no device, has ever hit this high a number on the wonder scale before.”
- Apple did it: they unveiled their AR/VR goggles. They are very expensive ($3499), look something like skiing googles, and have two hours of battery life on an external battery pack. It’s hard to imagine wearing them in public, though Apple may manage to make them fashionable.
- Apple’s big challenge with the Vision Pro goggles may not be getting people to use them; it may be getting developers to write compelling apps. Merely translating 2D apps into a 3D environment isn’t likely to be satisfactory. How can software really take advantage of 3D?
- Tim Bray’s post on what Augmented Reality is, and what that will require from software developers, is a must-read. It’s not Apple Vision.
- Hachette has created a Metaverse experience named “Beyond the Pages,” in part as an attempt to attract a younger audience. While the original experience was only open for two days, they have promised to schedule more.
- Ransomware is getting faster, which means that organizations have even less time to respond to an attack. To prevent becoming a victim, focus on the basics: access controls, strong passwords, multi-factor authentication, zero trust, penetration testing, and good backups.
- The number of attacks against systems running in “the cloud” is increasing rapidly. The biggest dangers are still errors in basic hygiene, including misconfigured identity and access management.
- AI Package Hallucination is a new technique for distributing malware. Ask a question that causes an AI to hallucinate a package or library. Create malware with that package name, and put it in an appropriate repository. Wait for someone else to get the same recommendation and install the malware. (This assumes AI hallucinations are consistent; I’m not sure that’s true.)
- A new standard allows NFTs to contain wallets, which contain NFTs. Users build collections of related resources. In addition to gaming (a character that “owns” its paraphernalia), this could be used for travel (a trip that contains tickets to events) or customer loyalty programs.
- The W3C has announced a new web standard for secure payment confirmation. The standard is intended to make checkout simpler and less prone to fraud.
- Tyler Cowen argues that cryptocurrency will play a role for transactions between AI systems. AI systems aren’t allowed have their own bank accounts, and that’s unlikely to change in the near future. However, as they come into wider use, they will need to make transactions.
Learn faster. Dig deeper. See farther.