It would be interesting with a graph of "performance per watt".
Mobile phones often have background tasks that does not need much CPU power. A53 seems very suitable for this, so it would be nice with some idea of how much power phones saves by using A53 for this instead of a high performance core.
Geekerwan on YouTube tests the performance and effiency of different cores at different frequencies. This is one good example where the A55 and A510 (successors to the A53) are graphed at around 15:40: https://youtu.be/s0ukXDnWlTY (honestly, the whole video is pretty informative)
It really depends on the workload. Modern thoughts on the matter trend towards a design called "race to sleep" where even your efficiency cores are still pretty beefy relative to an A53. That model is focused on quickly turning the core on, running through all of the work, and going back to sleep, rather than staying on longer on a much weaker core. Doing this effectively requires OS support to coalesce timer deadlines to bunch up as much work as possible on each wakeup cycle.
But with software support the model is very effective which is why you see most e-cores these days being relatively beefy OoOE cores that can leave the A53 in the dust. Whether that's Icestorm in the M1, Goldmont on Intel, or A57s on big.LITTLE ARM SoCs.
Pretty wild to me that the Cortex A53 is a decade old & barely modified in it's newer forms. It's so hard to imagine so many years & a microarchitecture remaining so unchanging.
By compare Intel's been making small but OK revs to the Atom cores. And the new E-core on the new n100 replacement is monstrously faster, yet still small. A potential core-m moment for Intel, a great small chip that expands.
>It's so hard to imagine so many years & a microarchitecture remaining so unchanging.
It's a shame, because it was the best design from ARM; they're now focusing on Cortex-A7x and Cortex-X, which aren't anywhere as power efficient[0].
Meanwhile, their revised Cortex-A57 has been surpassed in performance/power/area by several RISC-V microarchitectures, such as SiFive's U74[1], used in the VisionFive2 and Star64, or even the open source XuanTie C910[2][3].
I still use original Raspberry 2's (A53) in my server clusters, they are the lowest power devices that can saturate (good) SD cards on random writes/reads and still have performance left to serve calculations.
A72 in Raspberry 4 is really the pinnacle low power CPU performance if you count $ and compatibility (linux <-> GPU). ~3x can saturate symmetric 1Gb/s with advanced computation and still have calculations left.
You can buy neither, we'll see if the 4 ever comes back. The only thing I'm 100% of is that the 5 will have some drawback compared to both the 2 and the 4.
I can concur the peak of A55; the 5W TDP 22nm RK3566 in my RG353M is mind boggling! HL1 at 60 FPS and soon HL 2 at 30 FPS.
But Risc-V is not progressing with linux, mainly GPU drivers and integration is the problem. Historically the Chinese boards never got any attention and unfortunately you cannot depend on attention happening this time either.
Partly because now companies across the board(ers) are hiding kernel configs again. And "board support packages" are slow to be mainlined if ever.
32-bit Arm is not being removed. There's the modern 32-bit only Cortex-A32 "AArch32 for full backward compatibility with Arm v7" [1]. Also expect 32-bit code compatibility to exist for a long time on Arm embedded cores for code size reasons.
Intel makes it sound like they will never drop their legacy support, and are just "testing the waters" with X86-S, but internally the discussion is over.
I'm reading between the lines a bit to try to answer, so if I missed your question or your intent please just point it out and I will adjust my answer.
32-bit is extremely useful. For intel, 16-bit is also very useful. The problem is that there is a strong divide between the people who find 32-bit and 16-bit useful at Intel, and the people who feel a need to force everyone onto UEFI "Class 3+" [1] and that everything else must be squeezed out, shamed, and moved to an emulator.
It sounds like Arm 32-bit support might be around for a while. I mainly wanted to clarify that Intel is dropping their 32-bit and 16-bit support.
It suggest its applications are fundamentally not power or performance sensitive, and care only about cost. It’s the lagging edge of microarchitecture, where any improvement whatsoever is uninteresting because it would cost more than zero to develop.
> "The Performance P670 and P470 RISC-V cores are designed to take on ARM’s Cortex A53 and A55 cores, Drew Barbier, senior director of product management at SiFive tells eeNews Europe."
A compare-and-contrast article would make for good reading.
What I see there is that SiFive is, at the very least, five years behind the mid range of ARM CPUs, not only in performance, but also regarding the toolchain.
At a similar cost, what’s the real advantage of migrating the current catalog of wearables or IoT products, to RISC-V? There’s a proven and tested platform, widely used in the industry, and the alternative is still trying to catch up.
It’s easier to understand if you look at an analysis of a much older, less dense, die. Ken Shiriff is a superb source of these
There are some visual clues. First, the chip pins are labeled in the spec so you can guess they’ll be close to relevant units, and also try to trace their connections throughout the die.
Second, units like memory have an obvious regular structure because they are made from many identical micro-units
Third, if you see, for example, 16 identical adjacent units you could guess this is something that could be dealing with data 16 bits at a time. That narrows it down
There are numerous clues like those.
You could also use tricks like using a thermal camera. What part gets hot when you do certain operations?
One of my professional regrets was when I worked at Intel in 1997, I had 30x42 plots of chip dies hanging on the wall in our lab... I wish I had taken some of them and framed them, they were beautiful.
This reminds me that I meant to pick up some of the wafers sometime. I've seen failures/surplus/whatever for sale on eBay and other places and meant to get one to frame or display as they are impressive in their way.
Got on of these keychain chips in the late eighties from a friend whose dad worked at IBM. I remember that it came with a comparably cheap key ring clip and that the edges wore off nicely over time. Funny to be reminded of that. Thanks for mentioning.
+1 on Ken Shiriff’s blog, and his work with @curiousmarc on YouTube. Those gentlemen are national treasures whose work on restoring, documenting, and appreciating vintage computing and Apollo-era technology have been second to none in their breadth and depth.
I also personally really value their work—for anyone with intermediate to advanced knowledge of electronics engineering and computers, they are an invaluable source of educational entertainment as traditional mainstream media simply doesn’t cater to such niche audiences.
SoCs that have only A53 cores are terribly slow. Recently played with a Motorola Moto G22 phone with a Mediatek Helio G37. The phone has a nice design, 5 cameras, enough RAM and storage (4GB/64GB), but the UI is laggy and slow, installing and lunching apps, rendering web pages takes a lot of time.
This core is ideal to be replaced in low power platforms like SMB routers that run on MIPS (MT7621). I think Qualcomm and Mediatek is extensively using these in router SOCs which previously were based on MIPS. These cores are probably less 'application' cores and a more designed towards low power helping cores. For example QC uses network accelerators along with the cores. Anything gigabit and beyond still requires bigger ARM cores or Intel but below gigabit these are not bad.
Not trying to justify phone manufacturers not putting in the effort to optimize their software, but one way around the slow UIs is to go into the Developer Settings and turn the UI animation speed to 0x. It's a setting I've always enabled on my Android phones when I used to use them.
It's sad that Android doesn't automatically fall back to a simpler GUI which takes less time to render. Even Windows XP (2000, 98?) got this right (with manual settings).
Even an A53 is a super computer when it comes to graphics compared to CPUs of yore.
A serious improvement on these budget phones is turning animations completely off, it removed most of the stuttering and GPU usage doesn't spike just pulling up the keyboard.
Android doesn't automatically fall back to a simpler GUI
How much simpler can it be, given that everything seems to already be flat and borderless? As your last sentence alludes to, Windows and other desktop OSs worked perfectly fine with far more complex UIs (including windows) on far less powerful hardware. Mobile UIs seem to be quite primitive in comparison.
In other words, this is entirely a software problem.
A visually simpler GUI (such as Luna vs. "classic" on Windows XP) isn't necessarily less resource-intensive. Implementing an actual different, less resource-intensive rendering path could help, but would double the development effort.
Nope... I'm using a tiny SBC, a Radxa Zero with a A53 Cpu, with Manjaro Linux, as a ultra low power daily driver and it is perfectly usable for light browsing, programming or productivity.
It boots Linux in 7 seconds and xfce desktop is pretty snappy.
Kernel is 6.1 and RAM is only 4GB.
Opens Lazarus almost instantly and FPC compiles ARM binaries super fast.
Agreed. I'm perfectly happy to have a couple of A53s in my phone for background tasks. Four feels a bit overkill but okay, maybe it makes the big.LITTLE design work better.
But I've always been disappointed by devices that are all A53s.
And when I see devices that have eight A53s and nothing else, I have to assume that they are just trying to trick people into thinking it's a more powerful device than it actually is.
>I have to assume that they are just trying to trick people into thinking it's a more powerful device than it actually is
Why would you think that people who actually look up and care about the hardware at the same time are unable to read the first sentence on wikipedia and have no idea what it is? Do you really believe that customers of $100 budget phones are tricked into powerful performance?
Oddly enough, a very sizeable portion of customers seem to be aware of core numbers and "memory" size (often confused with disk size, though). The i3-5-7 naming scheme is also pretty widely understood. I used to work at an electronics store when I was a teenager (6-7 years ago, so not that long ago!) and that kind of made it a struggle to steer people who knew just enough to hurt themselves into buying an actually good product. I mean, that MediaTek is an 8 core CPU so why was I trying to sell him a 2/4cores Qualcomm? Or they'd buy a laptop with a Celeron and barely enough flash to fit Windows, but it was a quad core! Obviously better than the 2 core i3 that actually has space to install software lol.
I'd guess 25% of customers knew about those at a superficial level, and another 10% actually knew what they should be looking for.
I’d guess there are a lot of people who see “eight-core CPU!” and don’t research any further. Same way that PC buyers used to stare at GHz and ignore the total performance of the CPU.
It's not entirely their fault either. Manufacturers know what spec people are more aware of and include that in their product and make it front and center. It's more common than I'd hope to have a laptop with an APU paired with a separate GPU that is barely better than the one inside the APU. But people go "gaming, so dedicated gpu" and buy the product. What a waste all around.
I would not call the Redmi Note 5 (SD626) I've been using for the last 4 years "terribly slow", instead I call it "perfectly useable". This whole "laggy UI" thing people complain about is beyond me, the UI is GPU rendered and keeps up with most of what I throw at it. I don't expect a low-power device I charge every 3d day to perform like a mains-connected system.
> I would not call the Redmi Note 5... "terribly slow", instead I call it "perfectly useable".
The software you use plays a rather large role in how the hardware performs. Some people here like to live on the OEM-designed happy path, where things tend to just work. That means using Google Apps for everything, an expectation that the latest video streaming social platforms will open quickly and not stutter, and scrolling the Google Play Store or Google Maps will be a fluid experience.
Others may use simpler apps, or expect less of their phones. I'm in the latter category, and I suspect you are as well. While the BlackBerry KeyOne I use daily was panned by some six months after release in 2017 for being too slow, I instead killed off nearly everything else that would run in the background - including and specifically any Google frameworks and apps.
Some software companies have made a point of taking any hardware gains for granted. Most people have new phones, with fast processors, so some companies will push devs to take shortcuts. I'm quietly indignant about that, though that rant is rather tangental to your original question about how some have such different experiences from yours.
You may not expect desktop-class performance, though others do. Display scrolling on a mobile handset is an indicator of quality that separates cheap devices from those that one might actually want to use to get work (or play) done.
The thing is that the display scrolls without any hiccups on this device. Pulling down the notification shade is always fluent, gestures work without hiccups, controls appear and disappear fluently. The UI is not where these devices tend to show their lack of performance, for that you need to open a browser and load some heavily javascript-encumbered sites. As an example I use OpenHAB with some embedded (live) Grafana charts for control and data visualisation. Opening the site or the app (which just embeds the site) on my device takes a second, on my wife's S23FE it appears a lot faster. Changing between pages on the site or app is also a lot faster on her device than on mine. Since I program the thing I like using a somewhat slower - but still perfectly useable - device as it makes sure anything I produce will be fast enough even on less than top of the line hardware. I follow the same creed when it comes to PC hardware by using older devices, it has always served me well.
Understood, though consuming the sausage (as an end user) is often a very different exercise than making it, so to speak. Battery tradeoffs are a real consideration for end consumers, though are not as much an issue within a development environment where the test device is tethered to the host development machine.
They are meant to be used in a big.LITTLE configuration. So the A53 cores should be active in the low-power mode, and more powerful cores should be active in high-power mode.
I'll bet that's due to slow storage, and not the CPU(s); I've done a few handsets and tablets and write performance was a large part of being laggy or not. It's quite obvious when the storage is full and the flash controller spends a lot of time doing RMW ops and halting writes.
Yep, this was also the case with my old phone. Opening apps took a while but after that, everything was more fluid afterwards and clearly indicated that storage played a part in the device's slowness. Though, the 1.5 GB ram and the quad-core Cortex-A7 still made the device pretty slow.
Don't think so - CPU and GPU are far more important for the speed and fluidity of UI than flash write speed.
Yes, if the storage is full it can kill both the performance and stability of Android, but devices with slow SoC are slow even with plenty of free space.
> but devices with slow SoC are slow even with plenty of free space.
We'd find during initial development (i.e., raw, bare Android) that the initial bring up would have good-to-excellent performance, but as the storage began to fill (more "stuff" in the baked-in system/cache partitions, user-installed apps, etc.) it would lag more and more. You'd be surprised how in the early kernels (2.6-3.x series) "iowait"s would slow everything down, UI included, and not just loading speed of apps and such.
In regard to the A55 and the A510, can anyone explain the design goals of these? Do they refine the A53 as a "small" CPU? Or are they larger more featureful CPUs?
The main purpose of Cortex-A55 and Cortex-A510 is to implement additional instructions over those of Cortex-A53, respectively Armv8.2-A and Armv9.0-A.
This is necessary to make them ISA-compatible with the big cores and medium-size cores with which they are intended to be paired.
Besides the main goal of implementing improved ISA's, they take advantage of the fact that since the time of Cortex-A53 the cost of transistors has diminished a lot and they implement various micro-architectural enhancements that result in a decently greater performance at identical clock frequency, while keeping similar area and power consumption ratios between the small cores like A510 and the medium-size cores like A710, like they are since the first Big.little ARM cores (Cortex-A15 paired with Cortex-A7).
ARM has always avoided to publish any precise numbers for the design goals of the little cores, but it seems that they are usually designed to use an area of about 25% of the area of the medium-size cores and to have a power consumption around 0.5 W per core.
n my impression, the later ones are all supposed to be successors of the previous but within about the same chip area. The A55 is basically a refined A53 with support for DynamIQ. The point of the A510 was support for ARMv9 with SVE2, but it is wider also because people expect faster processors. To amortise the cost of the larger back-end it lost 32-bit support and there's an option to make a cluster of two share the same FP/SIMD/SVE unit and L2 cache.
The A53 is a fantastic workhorse for all kinds of embedded workloads. I was worried bloat would creep into later models, while the tried and true A53 moves toward obsolescence. From what you're saying, it seems like they are trying not to get carried away with it.
"Efficient cores for low power tasks and performance cores for demanding applications" is a catchphrase I've seen hundreds of times but I've never once seen someone actually demonstrate it or test it, or even really explain how my phone decides which is which. Does WhatsApp run on an efficiency core most of the time but swap to a performance core when it's converting a video to send?
https://eclecticlight.co/ Has multiple articles characterising the M1 (and M2) and how the MacOS scheduler uses it.
I’m sure Android’s scheduled does things differently but it’s at least an idea of the sort of things which can happen.
For macs (and I assume iOS) the basics are that background processes get scheduled on E cores exclusively, and higher priority processes get scheduled on P cores preferentially but may be scheduled on E cores if the P cores are at full occupancy.
and it uses cgroups too. Process.THREAD_GROUP_* has some Android ones, but different vendors sometimes write their own to try and be clever to increase performance.
It's also worth bearing in mind that there's been a lot of work put into that scheduler over the years so it will make better decisions about what to run where when the cores aren't all the same.
"Generally, when the game is in the foreground, persistent threads such as the game thread and render thread should run on the high-performance large cores, whereas other process and worker threads may be scheduled on smaller cores."
There's also a Wikipedia article [2] which talks a little about scheduling. I imagine Android probably has more specific context it can use as hints to its scheduler about where a thread should be run.
It is very popular only because almost all companies that are neither Chinese nor smartphone-oriented have failed to introduce any products with less obsolete ARM cores, for many, many years.
NXP has begun to introduce products with Cortex-A55 only recently, but they should always be preferred for any new designs over the legacy products with Cortex-A53, because the Armv8.2-A ISA implemented by Cortex-A55 corrects some serious mistakes of the Cortex-A53 ISA, e.g. the lack of atomic read-modify-write memory accesses.
The people who still choose Cortex-A53 for any new projects are typically clueless about the software implications of their choice.
Unfortunately, there are only 3 companies that offer CPUs for automotive and embedded applications with non-obsolete ARM cores: NVIDIA, Qualcomm and MediaTek. All 3 demand an arm and a leg for their CPUs, so whenever the performance of a Cortex-A55 is not enough it is much cheaper to use Intel Atom CPUs than to use more recent ARM cores, e.g. Cortex-A78.
This is a little harsh, upgrade cycles in these fields are very long. I'm working on a project right now where we are upgrading from iMX6 (quad A9) to iMX8 (quad A53).
It's interesting that the author used a photo of the Tegra X1 here. My understanding is that the Nintendo Switch (most widely distributed Tegra X1?) never or very rarely uses its A53 cores.
There's a lot to be said for open source ISAs like RISC-V. But it's a lot harder to create a top of the line open source microarchitecture implementing it that's competitive to closed source designs. A Linux kernel developer can make changes and test them several times a day for a cost in electricity measured in cents. An equivalent build/test cycle on a CPU core is going to be north of a month and a million dollars. Simulation helps but to optimize speed and yield you really need to build the chips and due to physical effects that's a process with much weaker abstraction barriers than software development. So I'm skeptical that we'll ever have cutting edge open source microarchitectures.
I am not talking about open source architecture, but more about a worldwide non-toxic ISA: namely anybody can create FREELY a RISC-V microarchitecture AT WORLWIDE SCALE, and that closed or open, which you cannot with arm or x86.
Ofc, worldwide royalty free is not enough (or just be allowed to implement the ISA...), silicium is really about performance, and I sincerely hope RISC-V will end up providing microarchitectures (open or not) "good enough" to do the job.
I am perfectly aware RISC-V will fail if not providing at scale really good implementations. Rumors say "really performant" implementations are not expected before 2024.
> namely anybody can create FREELY a RISC-V microarchitecture AT WORLWIDE SCALE
The problem with this argument is that it ignores the cost of creating the microarchitecture. It’s almost certainly cheaper to license an Arm A series core than to create a comparable RISC-V core from scratch.
Sure we have a firm like SiFive that licenses RV cores to third parties but the existence of firms like Arm and SFive shouldn’t be taken for granted. If rumour has it SiFive were almost taken over by Intel. Thankfully Nvidia were stopped from buying Arm.
If you wish for Arm’s demise you may get the end of their business model and that probably isn’t a great outcome.
risc-v is not limited to sifive, there are several other implementations. And yes, arm and x86 made angry ppl with enough resources to make happen risc-v, and it has been a decade and it is still gaining momentum even if the market is over-saturated with arm and x86. That's last point is really positive, and it allows us to think risc-v could be successful.
But I agree that without _REALLY_ performant implementations, risc-v WILL fail and rumors say that things won't start to get serious before 2024. Nevertheless, in its current state, failure is still a more than valid outcome.
A word of caution: until the "performance" is actually here, better stay humble.
As far as I know and specs wise, RISC-V has been kind of ready for a while, only missing very high performance implementations.
My first real risc-v target is a 100% RV64 assembly keyboard firmware though. Looking at mango pi pro mq boards, but I wish we had 'smaller' RV64 GPIO/USB boards for that, maybe a small GPIO/USB board with a USB block+FPGA with enough gates to instance a small RV64 core.
I think this may be a bit too many instructions (I did not see hardware accelerated memcpy/memset though). I guess I would use only a small subset of them. Since RISC-V is a royalty free standard, writting directly assembly is worth it, until the abuse of a macro processor and code generation is avoided.
I want to share you optimism, but I advise you to keep your cool. There is a long road to reach the performance of x86/arm microarchitectures (it is harder for risc-v since the "market is over saturated"). And those performant implementations must get access to the best silicium node process... and that...
Competition. Multiple vendors can compete with RISC-V while ARM can prevent all competition with licenses, if it really will happen is a different case, but I'm optimistic about the potential.
Possibly, but historically it high end semiconductors seem to turn into a winner takes all market more often than not. If that’s the case it’s probably preferable to have someone like ARM at the top than Intel or Nvidia.
Most people want to build things not cores, the cores aren't the core competency. So the hope is, release what cores you have & let others improve it for you. Western Digital's swer-v core is seemingly an example of this thinking.
Would you not expect companies which actually want to make to cores to have an advantage?
And the companies that don’t want to make it their core business but can afford enough resources (e.g. Google, Apple, Amazon) would just use them to leverage their core products.
I could only see this on the lower end where margins/required R&D investment are relatively low.
Mobile phones often have background tasks that does not need much CPU power. A53 seems very suitable for this, so it would be nice with some idea of how much power phones saves by using A53 for this instead of a high performance core.