As smartphones become more accessible, billions of people have come to depend on their features for daily life. One of the most important aspects these days is the camera. OEMs have been working for years to improve camera quality, and they seem to have finally figured it out — the best way to achieve DSLR-quality photos wasn't just with better sensors, but with better intelligence.
Smartphones manufacturers are limited in the hardware department because of a lack of options. Many key components have only one or two suppliers, which leads to most flagship devices sharing the same parts. This has caused OEMs to start looking for other ways to improve photography, and three industry leaders have found that machine learning provided the answer they sought.
To put it as simply as possible, machine learning (ML) is a type of programming that lets computers improve their capabilities without further human interaction. A human will program basic components of the system, then allow the computer to refine itself based on parameters that were initially set by the human programmer.
Think of machine learning like electronic evolution — we give the system all the basic tools it needs to get started, then the system refines itself through trial and error. Again, though, this is a simplified explanation. If you'd like a more detailed summary, I highly recommend watching this video.
As it pertains to smartphone photography, the main thing you should know about machine learning is that devices can use this technology to refine their object recognition skills and better decipher patterns in images.
However, many OEMs are marketing machine learning as artificial intelligence when it's merely a component of AI. Artificial Intelligence is a broad concept that describes machines which function entirely independently of human beings while managing to think like human beings. This includes the ability to learn independently, analyze their surroundings without assistance, and make independent decisions.
Smartphones are only just beginning to become efficient at machine learning, but are far away from incorporating all components of artificial intelligence. Their learned behavior comes from data provided during production, so their learning capabilities are limited or nonexistent after the devices have been shipped to consumers. However, this learned behavior does greatly assist with tasks performed by smartphones, including photography.
With machine learning, datasets can be incorporated into a smartphone that "teach" the phone what a professional-style photo looks like. With this information, a device can analyze the photos you capture or the images being viewed by the lens, then adjust the image to mimic a pro-style photo. And to process all this, a piece of hardware called an "NPU" was born.
A Neural Network Processing Unit (NPU) is a component installed on or alongside the SoC that operates independently of the other parts (such as the CPU and GPU). The NPU is designed specifically to analyze and process machine learning data. While the CPU and the GPU can process this type of data, the NPU can process it faster and use less power doing so. This makes the entire SoC more power efficient, as all data is processed by the most efficient component.
Before NPUs, some companies would use cloud computing to process machine learning data. However, this limits performance to the strength of the internet connection and prevents any processing when the web isn't available. With the push for smartphones in emerging markets where the internet isn't widely available (see Google's Android One program), NPUs will need to be included in phones' SoCs to bring machine learning to all markets.
Eventually, these NPUs could become powerful enough to use the information provided by users to continue learning. By analyzing how you use your device, phones could better optimize themselves to your specific tendencies and usage and creep us closer to artificial intelligence.
However, the way an NPU is used will depend on the manufacturer. As you will see, the three companies who added NPUs to their flagship phones this year utilize the component differently. That said, camera improvements seem to be the central theme so far.
Google has gone against the grain with their camera this year. Instead of deploying dual cameras like most other flagship devices, they opted for a single lens. To compensate for the lack of a second lens, they integrated machine learning into the Pixel 2's camera.
Google Camera is the default camera app for Pixel devices, and its HDR+ feature has become a famous example of machine learning being used to improve smartphone photography. The app takes a large number of photos each time you press the shutter button, then uses machine learning to combine these photos in a way that results in better dynamic range.
Prior to the Pixel 2 and 2 XL, the Google Camera app was dependent on Qualcomm's Hexagon DSP (Qualcomm's off-the-shelf NPU) to process any machine learning. However, this year's Pixel phones included a dedicated, Google-built NPU called the Pixel Visual Core.
Essentially, Google is using its NPU as an ML-powered Image Signal Processor (ISP). The Google Camera app calls on the Pixel Visual Core to perform machine learning tasks that enhance photos and videos. HDR+ is just one example of Google's usage of machine learning, but the Pixel 2 packs several more ML-powered camera enhancements, including a portrait mode with only one lens.
With the Android 8.1 Oreo update, the Pixel Visual Core has been unlocked for third-party apps. Now, any app that properly utilizes Android's Camera2 API can take advantage of the machine learning enhancements made by the Pixel Visual Core. Put simply, when you take a picture in an app like Instagram, Google's HDR+ algorithm and other machine learning enhancements will be applied to your photo as if you captured it with the default Google Camera app.
Thanks to machine learning, the Pixel 2 has become the standard for smartphone photography, besting even the iPhone X and Galaxy Note 8 in many areas. In order to achieve this, however, Google had to design their own NPU.
Since the Pixel 2 uses Qualcomm's pre-made Snapdragon 835 SoC, adding a custom NPU would be difficult. Qualcomm already has the Hexagon DSP, which performs a similar task. Also, the Snapdragon 835 is the SoC of choice for all but a handful of flagship devices, so Qualcomm would prefer not to make special additions to their SoC just for one manufacturer. Therefore, Google opted to design its own NPU which operates both independently and in conjunction with the Qualcomm SoC.
However, the greatest aspect of the Pixel Visual Core is its ability to perform such high-level computing five times faster than the CPU performs the same processes — and it does this while using 1/10th of the power. The addition of the Visual Core is how the Pixel 2 was able to surpass its competition's (and its predecessor's) photography performance.
Apple's decision to implement an NPU — known as the Neural Engine — was the result of a need. When Apple designed the new iPhone X with an edge-to-edge display, they knew that Touch ID would be affected. With no space to house the fingerprint scanner, Apple innovated by adding Face ID, a new method for secure authentication.
However, to achieve security and reliability in a system that relies on facial recognition, an enormous amount of data needed to be collected and analyzed. According to Apple, each time an iPhone X user unlocks their device, thirty thousand dots align themselves along the user's face using infrared light. These dots create a map, which is than analyzed and processed by the Neural Engine to verify the user.
The Neural Engine has two cores that perform real-time processing with a theoretical output of 600 billion operations per second. It is contained within the Image Signal Processor inside of Apple's acclaimed A11 Bionic SoC, and assists with everything from Face ID to photography.
For photography, the Neural Engine is used to achieve Portrait Mode on the front-facing camera. Instead of the dual lens available with the rear camera, Apple uses the information gathered from the various sensors that comprise Face ID and sends it to the NPU. The NPU analyzes the data and uses a system called TrueDepth to achieve the bokeh effect.
The Neural Engine also analyzes various objects while taking a photo to improve image compression without distorting your photos. It even assists in tracking and focusing on subjects to help you take faster photos. In short, Apple is using their NPU in many of the same ways that Google used theirs, except Apple's is built into the SoC.
Huawei's approach to machine learning attempts to enhance everyday tasks such as security and overall performance. The Huawei-built Kirin 970 chipset has a dedicated NPU for several different machine learning computations, and has made its debut in the Huawei Mate 10 and Mate 10 Pro.
The NPU is comprised of five "engines" that handle specific functions: A Vision Engine, Experience Engine, Performance Engine, Power Management Engine, and third-party App Engine. These engines are applied to improve efficiency in various tasks, such as lowering power consumption while using augmented reality and instantaneously translating text via Microsoft Translate.
Like Apple and Google, however, the Kirin 970's NPU is largely used to improve photography. There are two specific areas where machine learning helps: Before you take the shot and after you take the shot.
Before an image is captured, the NPU provides real-time object recognition and scene adjustment. The NPU can help identify up to thirteen different scenes and objects as they appear on your display, and you'll be notified of this recognition with an icon in the bottom-left corner of your display (while in Auto Mode).
Based on the objects and scene, the NPU will automatically configure the camera app with the optimal settings for the picture you're about to take. Then, after you take the photo, the image's colors, contrast, brightness, and other features will be automatically adjusted to provide professional quality photos.
Ideally, Huawei wants users to not have fumble around menus to take great photos. Instead, the NPU handles all of that for you, so you can take professional style photos without any adjustment beforehand.
Smartphone photography has gone through eras that have typically been defined by a big feature being adopted by major OEMs. First, we had higher megapixels, then optical image stabilization, secondary cameras, and optical zoom. Now, we're entering the machine learning era of smartphone photography, and the NPU is quite literally at the core of it all.
How do you feel about this new era in smartphone photography? What do you predict the next one will be? Let us know in the comments below.