Something alluded to, but not really elaborated on, is what the motivation is for emulating progressively newer CPUs, versus virtualizing them.
With old hardware, the sort I imagine 86Box is mainly used to emulate (it's how I use it, anyway), old OSes rely on specific behavior of old systems and peripherals that can't be easily virtualized through KVM + QEMU or the like. A mixture of processors being too fast and behaving subtly different, possibly due to differences in "undefined behavior".
I doubt there's a binary cutoff point where it's obviously more useful to virtualize or emulate, but it's obvious that emulating an 80386 is ideal, and virtualizing a "Core 2 Duo" is ideal, for running software from those respective periods. Is the Pentium III desirable to emulate, and simply not computationally feasible, or is this just people wanting to make cool things for cool things' sakes? Or a little of both?
What about old versions of VirtualBox? As far as I understand, VirtualBox originally used some kind of dynamic recompilation to run on hosts without modern virtualization (like VT-D). It used to run most instructions directly on the host CPU, emulating only certain ones. I think they removed it at some point, though. Is it not accurate enough? I remember it used to run OS-X much better than other virtualizers.
I'm not sure why this was ever dropped, if I remember right back in the early 2010s, this ran XP very usable on a C2D XP laptop. Pretty sure I even got 7 working at some point, the only real limit was that it absolutely could not run 64 bit vms.
It's because dynamic recompilation as done in VirtualBox & QEmu is on the order of 10x slower than CPU accelerated/supported virtualization. You don't notice that (as much) with DOS or a simple Windows 9x VM. But you will notice that if you're running busy SQLServer, ISS, .Net, Linux, PostgreSQL, MySQL Servers.
I recall awhile back Linus Tech Tips pointed out that the primary failure point for old machines wasn't the CPUs which tend to be piling up cheaply (I can't say how true this is for pre PIII parts) but instead the motherboards.
I wonder if in some sense there isn't a market for someone to just make new motherboards instead for this stuff using modern parts. I realize there are some licensing issues after a certain point, but in theory everything up to a PIII should be doable. Particularly since memory controllers are in the chipset there are options to use things like modern ram chips and then 'downgrade' performance to be compatible in the chipset. This could also be used to allow for modern M.2 SATA drives to be run in IDE emulation mode locally or even have SD/CF options directly on the board.
The main downside at that point is space of course, hardware always takes up more space than an emulator ;)
I kind of miss add-in processor cards to allow a computer to run software from another architecture. Those devices were basically a whole computer on a card or add-on that would use the host computer for I/O with some level of integration. Stuff like the Apple IIe card for the Macintosh LC, or the MacCharlie sidecar for the original Macintosh that allowed it to run PC programs.
I bet there are modern PCI cards out there that can add a hard-to-emulate processor to a consumer/prosumer-grade system, but I'm not sure what to search for.
For the Intel chips up to and including the Pentium 4 licensing a chipset would not be an issue. VIA made chipsets back when those products were new. I think there was a lawsuit about it, but VIA prevailed. You could license the chipset from them, then use it with an authentic Intel CPU.
But to your original point: CPUs don't really fail unless abused is my understanding. The bad ones are filtered out at the factory before they ever get to consumers.
That might even be ideal insofar as the chipset is driver compatible with equivalent from the time. Via wasn't anything special but they were super common and thus decently supported IIRC.
There is a precedent for that with the reproduction Amiga motherboards that came out a couple of years ago. Those were actually just pcbs but something more accessible would be cool.
Chinese companies are making new boards for >=Nehalem Xeons using recycled server chipsets, mostly because they're still OK for gaming on a budget.
If there was a demand they could probably do the same for older models if they can harvest enough chipsets. There were quite a few 430HX's on embedded modules that could be recycled...
This is an avenue I’d like to see more pursuit of, and not just for x86 machines — new PowerPC mac motherboards that can be fitted into standard ATX PC cases and can be powered by unmodded off the shelf PSUs would be huge.
Where will you get your supply of custom chipsets? Do you want to reverse engineer Apple's already badly documented chipsets and reimplement them on fast enough FPGAs maybe even support more modern RAMs or improved off-chip caches? It would be nice to have a mATX NewWorld Mac you can maintain for playing old MacOS games, but the farther you go into the G4 (or even worse G5) era the more advanced GPUs you would have to reimplement as well. The PPC6xx era 2D on-board graphics may be in reach of such a project, but I can't see anyone without Nvidias active support releasing a Quadro FX 4500 compatible GPU clone.
Chipsets indeed prove a challenge, but it’s not insurmountable. QEMU’s emulation of a G3 tower is a decent example of this.
There’s decent availability of PCI and AGP GPUs supported by OS 9 and early OS X, especially if you include PC variants that can be flashed to work with Mac OS, so I wouldn’t worry about video too much in the short term… just make sure the motherboard has a PCI or AGP slot.
Maybe you could do a hardware version of that the emulation community does and say "this FPGA should act like a chipset X, here's how to program it, but don't ask us for the blob."
> old OSes rely on specific behavior of old systems and peripherals that can't be easily virtualized through KVM + QEMU or the like
This sounds interesting. Do you have examples of these behaviors where something is faulty using standard virtualization versus hardware-level emulation? (I know clock speed can make a difference.)
These crashes are not related to clock speed. While Windows 9x already cannot run on these CPU due to clock speed, the clock speed issues have mostly been patched in 9x.
And 9x can still be virtualized even on the latest Intel iMac, with nested paging enabled (or not). On AMD, you need to disable nested paging due to the above change in behavior.
The Intel Macs were a strange not-quite-PC-compatible; the BIOS was EFI, and I believe things like the 8042 (which controls A20) weren't present although I'd have to check the schematic to be sure. That said, I'd be surprised if port 92h "fast A20" wasn't present --- that's been in nearly every chipset since ~386 or so.
It’s probably not even possible to emulate the Pentium Pro (or any other out-of-order x86 CPU) in a cycle-accurate fashion at the original speeds on contemporary hardware. Just attempting to match the original behavior for instruction scheduling, cache models, branch prediction, etc. would blow your CPU budget.
I've actually put quite a lot of thought into this topic (but for a different, non-x86 out-of-order CPU)
It might not be possible to do it with an interpreter, but a well-designed JIT should be able to shift the cost of calculating instruction scheduling and cycle-costs from execution time to JIT time.
Since OoO CPUs spend long stretches of time between branch miss-predicts and L1 cache misses, you can get long sequence of instructions that executing with the exact same timings every time. A tracing JIT is perfect for this usecase. You just need to normalise the pipeline state on entry (and this since this normally happens after a miss-predict, the pipeline is often drained), and then exit on every branch miss-predict or L1 cache miss and start a new trace.
I suspect such a scheme might be fast enough for a Pentium III, I just need to find some time to actually try out my ideas at some point.
You a right, they get a 190-360x overhead, and on today's CPUs, I think I need need to get it down to about 20-30x overhead to get realtime on current CPUs. It's only off by a single order of magnitude.
Though, it would be fast enough for some usecases. 10% of realtime would be bearable for some TAS usecases.
It might be possible for me to improve on what they have. For starters, they aren't using a trace-based approach and I think they are applying the memorisation on every single branch, correctly predicted or not, and every single memory access (even when they hit L1). But I would need a 10x improvement over what they had, and that's a big ask.
gem5 is a simulator that aims to do precisely all that, and yes it is extremely slow. Probably on the order of 100x slower, but I don't have timings to hand.
Similarly booting a small processor compiled into C++ with verilator is very taxing on the processor too.
Why would u want to do cycle accurate emulation? Software ran on a wide variety of different software, I doubt there’s any software that relies on cycle accuracy for a Pentium pro level machine. And afaik, box86 is about running software, not modelling some specific machine exactly.
From what I've seen of the emulation community, I can only think it's because many emulators are small enough that they're feasible for a single person to work on and maybe get too attached to.
It's difficult to let go of your code baby, especially when you could technically do it all on your own. It's probably easier when the task is so large there's no way you could do it alone
You ever go back on the PCem forums and see the old posts where battler would request completely ignorant things from walker? I beg to differ on 86 handling anything well. The constant PCem smearing to the point where walker just left from all the stress. I consider 86box a patchwork mess. They might strive for accuracy but I don't believe there is a full understanding of the original code.
Sorry but not getting to point of cycle-accurate emulation of recent x86 CPUs. What could possibly benefit ? The performance is going to be a disaster compared to say qemu....
The point of cycle-accurate emulation is to emulate something with accuracy down the the cycle, so that software that relies on some particularity will work on it.
Performance is secondary, you can think of the emulation as a lasting documentation, which in time comes usable due to improvements of hardware and software, or can be used as a reference for FPGA, ASIC or alternative implementations.
The question is which software that requires a Pentium III also requires some CPU timing peculiarity. There have been enough CPU vendors that such software would have been difficult to run even at its era.
This is not a videogame console where all hardware was the same, or the early PC world were everything was designed for the Intel 8086 timings as on the IBM PC. This is the modern PC world. Which exact CPU model would you even choose to emulate ?
I'm of the belief that hardware is not truly preserved until you have a fully-accurate emulation of it.
You are correctly that software doesn't require fully accurate timings. Just accurate enough to bypass any timing bugs and replicate the experience. Especially outside of the console space. If your only goal is running all known software, then you can get away with some massive accuracy bugs.
But there is more to hardware preservation than simply running all software that might have been shipped on a platform. Some people want to do retro programming and develop new software for a platform. And if you don't have accurate emulation, then the more likely you are to introduce a bug that works in the emulator but not on real hardware. The less accurate an emulator, the more often you have to check on real hardware. And since there was such a wide variety of real hardware, your collection would have to be huge to ensure extensive testing.
You could argue that people wanting to do retro-coding as a hobby should just test on real hardware, but I'd argue that raises costs to the hobby. Also, in the distant future, the last Pentium III will die, and acquiring real hardware might not be possible. I have no idea if people will still be interesting in retro-coding for the Pentium III that far in the future, but my point is the hardware is not truly preserved unless they can.
BTW, the Pentium III was used in a console, with identical hardware configurations, so an accurate P3 might be more useful there.
Though, the Pentium III was used in a video game console. An accurate emulation could be more useful there.
Neither Xbox or PS2 are emulated at a cycle level though. They mostly rely on recompilation and API emulation, and PCSX2 has gotten pretty far on that. That generation was really the start of cross platform being the norm, and so the start of a dramatic drop off in games requiring very specific hardware details. If your game had to run on Xbox/PS2/WinXP then there has to be some level of portability considered in the software.
> Neither Xbox or PS2 are emulated at a cycle level though
Yet... Like I said, you can get pretty far with low-levels of accuracy. PS2 emulation is actually quite timing sensitive.
I have at least one bug in Dolphin that I investigated, that can't get fixed correctly until we get significantly better GPU timings. And also some speed-running strategies that rely on generating enough lag, but don't work in dolphin because it (usually) emulates the CPU and GPU way too fast.
> your only goal is running all known software, then you can get away with some massive accuracy bugs.
> Some people want to do retro programming and develop new software for a platform. And if you don't have accurate emulation, then the more likely you are to introduce a bug that works in the emulator but not on real hardware.
The two things go hand by hand. If era-developed software is unlikely to suffer from timing bugs, then _your own software_ is also unlikely to suffer from timing bugs. It's down to the same argument.
It's like claiming that because I developed for Pentium 4, my software is unlikely to work on the Pentium 3. Save for the very explicit case that I use some new extensions, how crazily out of the way would I need to go in order to even remotely hit such an issue?
In fact, it can all be summarized to: what CPU timing would you even emulate ? Why would you even target the P3 _specifically_? Why not Transmeta?
Note that this does not apply to accuracy emulation of accompanying hardware, but then again I would also claim that accuracy of hardware emulation is hardly relevant post-P3, since _the real hardware_ often is massively inaccurate by any definition of the word. Why accurately model a specific Radeon card, when the budget model of the same year is completely different , with the differences abstracted by hacks in the driver ?
Accurate timing is most useful when you are retro-programming a video game, or something else real time.
How do you know if your game will run at 60 fps if the execution times aren't accurate?
> what CPU timing would you even emulate
Ideally your emulator would support as many CPU + hardware configurations as possible, at many different speeds, so you can test as many as you want.
But just one single accurate hardware configuration is better than none. At least then I can say "I programmed this game it runs on a Pentium III 550E, with a Riva TNT2"
It's exactly the same when doing retro-programming on real hardware. If you only have one PC, then you can only confirm it's working on that exact same hardware configuration. But accurate emulators have advantages due to cost and ability to easily support multiple configurations.
> How do you know if your game will run at 60 fps if the execution times aren't accurate?
Again, this is not a console. If you rely on a specific Pentium 3's instruction timings to reach 60 fps, your game is not going to reach 60fps _on any other PC_, not even if someone has an identical CPU, since any single other difference in hardware, configuration, or even layout of the filesystem is going to matter much more.
You just can't get away with the same kinds of bug you can get away with in consoles, because even just trying ATI vs NVIDIA (or any two different brands of accelerator) is already going to be a completely different environment and timings, likely enough to trigger all those bugs (or at least more than different instruction timings will).
i.e. even the simplest of emulators (incl. a virtualizer) with a runtime cap is going to suffice for the usecase of mildly estimating a framerate based on the CPU of some era. And there's very little value to increase the accuracy of such estimation since with so wildly varying PC hardware anything you can produce is going to be irrelevant anyway.
(How to make a similar accurate-enough estimation of GPU performance is a different story).
There is a world of difference between your inability to see the appeal/utility of a thing and that thing being actually worthless, as you are so stridently insisting here.
For a mental exercise, just to prove it to yourself, why don't _you_ try examining all the reasons such accurate emulation might be desirable?
The other poster already provided a reasonable motivation - preservation. But since this discussion started, you've only really come out swinging with disparagement. One has to wonder why you are putting so much energy into suppressing and trashing someone's hobby.
Frankly it looks to me like you have some kind of preconceived and inflexible bias, or that you maybe trying to discover the appeal to this effort in a pointlessly adversarial way. Not a great look. If you really want people to think you have a smart, winning argument maybe try to show some understanding of both sides of the coin before floating your attempt at a clever and withering denouncement.
PC games stopped being CPU cycle-bound long ago in mid-late 90's.
The most issues you would have it's for high 486-Pentium I-II era games (specially the multimedia ones) which lots of them were speed bound, but for sure these games will be interpreted by ScummVM one day or another (Macromedia Director engine).
> There is a world of difference between your inability to see the appeal/utility of a thing and that thing being actually worthless, as you are so stridently insisting here.
I am asking a question, and answering "preservation" (which is not really the answer the poster made, since his goal is new development) without giving an actual concrete example of what behavior needs such accurate preservation kind of defeats the purpose of asking the question in the first place.
If the answer is "for the sake of it" that is also fine. But I'm unaware of anything post-P3 that would really require cycle-level emulation, so I ask. Most PC emulators "draw the line" around that era for a reason, even the ones who wouldn't necessarily have performance problems with newer machines (like DosBox).
> But since this discussion started, you've only really come out swinging with disparagement. One has to wonder why you are putting so much energy into suppressing and trashing someone's hobby.
What do you think? Because not only I have the same hobby, my work is also related to this. I am most definitely not interested in thrashing it.
It is also the _main thesis_ of this entire article, so why shouldn't we discuss it?
To add an example to your arguments, the C64 and PC/XT demo scenes show what can be done with cycle accurate retrocomputing. Especially 8088mph: https://youtu.be/yHXx3orN35Y
If someone was to decide to develop a game for a minimum performance requirement of a Pentium III at 500mhz and some GPU, for some reason (which would be a completely arbitrary choice, but that's the hobby, so roll with it) then the only way they can possibly check that out meets that minimum requirement is to test on a machine with that configuration.
Either a real machine, or a accurate emulator.
It doesn't matter if there is other PC hardware configurations out there with different preformance. A minimum requirements just means "I tested on this machine, and it meets the minimums." Ideally you should underspec your minimum requirement test machine so that your target audience can be reasonably expected meet it.
You can't substitute in virtualization. That has zero chance in hell of providing a realistic estimate of performance, even if you paired it with an accurate gpu emulator.
Modern CPUs simply have very different performance characters, instructions that might have huge stalls on the P3 might be extremely cheap under virtualization. Caches are also widely different sizes.
If you use a proper, but inaccurate emulator, you get different issues. Even if it was tuned to provide a decent estimate of cpu performance over average code (and they are typically tuned to overestimate cpu preformance, because people playing games would rather framedrops from real hardware are not emulated), it's just an average that doesn't take into account things like cache misses and branch misspredicts.
If you were to write code with a lot of cache misses or branch misspredicts, your inaccurate emulator would massively overestimate it's preformance compared to a real cpu.
The various issues just add up and it becomes impossible to profile and optimise the game you are developing unless you have an accurate emulator. Other solutions will all point to different parts of the code being hot.
Also, remember this is within the era when you might be still developing a game with a software renderer, and if not you still have to do vertex transform and lighting on the cpu.
Personally I'm not that interested in accurate emulation of PC, the issues get a lot worse when it comes to developing games for 5th, 6th and maybe even 7th gen consoles. That's were my true interest in accurate OoO emulation lies. But I can see why someone might want accurate PC emulation too.
> If someone was to decide to develop a game for a minimum performance requirement of a Pentium III at 500mhz and some GPU, for some reason (which would be a completely arbitrary choice, but that's the hobby, so roll with it) then the only way they can possibly check that out meets that minimum requirement is to test on a machine with that configuration.
This just doesn't happen in modern PC development, save for heavyweights who can afford multiple identical hardware configurations (e.g. HPC clusters). I know I'm repeating myself, but the variety of configurations just makes this highly implausible. Sure, you can be some demoscene type of guy who decides to target specifically this configuration, but then you're literally targeting one processor out of hundreds, and per your own words, the fact that it works on the 500Mhz doesn't mean it will work with the same performance on the next generation or even on the 550Mhz variant. I guess this is obviously fine, but really stretching it. You'll quickly end up having something that only works on your machine, with the same starting disk image, etc.
Even PCem doesn't fully simulate the x86 cache because there is no benefit to it, and that includes cores from eras which were much more sensitive to timing. Branch mispredictions? Forget about it. Most P3 software is going to run concurrently to some other software, anyway.
I'm not saying that you don't need a cycle-accurate simulator to get real timings. I'm saying that with such a large divergence in configurations and environments, virtualization (or any other inaccurate emulator) is likely to provide a performance level that is quite accurately somewhere in the interval. Most specially since you will have actually calibrated it to that interval beforehand :)
Now on consoles I can see the benefit. Consoles are lots of identical hardware, operating systems that tend to get out of the way, and the people who develop for them only test (for obvious reasons) on the console hardware itself or at most a developer edition which has the same hardware (for obvious reasons again). You can have a silent bug that depends on timing of a mispredicted branch or the relative speed between the bus accesses of two cores and _never_ notice it since your testing environment is exactly 1 device (such a bug would immediate flare on a PC on like the 2nd reboot).
Whatever it is that you develop for any one such console, it is highly likely it will work on all the million other sold consoles. Consoles are practically designed to have reproduce-able environments.
On the other hand you practically can emulate the entire x86 software catalog with emulators which _still_ have large differences in behavior at the actual instruction level compared to the hardware, so the instruction timing doesn't really seem important, and creating now some software that does depend on it seems .. complicated.
As an anecdote, not long ago I was working on a x86 emulator, and to my horror I realized that the push/pop instructions were actually miscomputing the operand size on a rather common but not primary situation (long mode but with a 32-bit segment). The emulator was pushing the stack by double the amount it should, and pushing/popping the high dword of registers it shouldn't have clobbered. This was actually happening in some of the most critical operating system code out there (bootloaders, WoW, etc.) ... and yet the bug had been in the emulator for years and no one had been the wiser, booting 64-bit OSes just fine :)
I think the variety in configurations is completely irrelevant.
If you aren't using at least one accurate configuration for your testing, there is a huge risk that you miss your performance target by a huge margin. Your 60fps game could end up running at 20fps on your target minimum hardware. Small performance inaccuracies can massively add up if you have a non-emulated cache-miss or branch miss-predict delay in your inner-most loop.
I think you are massively overestimating how accuracy of timings that you can get though virtualisation or semi-accurate emulation. Yes. They are probably accurate enough for running any historic software from the era, as most code for the PC is well-behaved to not do the wrong thing when running too fast.
It's just for the use-case of developing new software, as soon as you start optimising or profiling, you need accurate timings. And yes, we might be talking about weird demo-scene style projects along the lines of "I want to get the absolute best possible graphics out of the computer I had 25 years ago, no frames dropped, no wasted cpu cycles". I'm talking about the kind of project were someone is writing inner loops with intrinsics or in assembly.
You might argue that such a project is a massive edge case that it's not worth catering too. And if you are writing an emulator, that's a 100% legitimate position to hold, emulators shouldn't have to cater for every possible usecase. My point is only "If you don't have a 100% accurate emulator, and there is some niche use case it can't emulate, then the hardware isn't fully preserved" and that it would be nice if an accurate emulator existed.
> Now on consoles I can see the benefit. Consoles are lots of identical hardware, operating systems that tend to get out of the way, and the people who develop for them only test (for obvious reasons) on the console hardware itself or at most a developer edition which has the same hardware (for obvious reasons again).
Yes, I've chased after bugs in console emulation (Dolphin Emulator) that were impossible to fix correctly without significantly more accurate emulation.
Like the game which memset a staging buffer before data had finished DMAing out. The game was only saved on real hardware because after memsetting, it invalidated the cachelines and in typical situations, none of the memset cachelines had been evicted. Impossible to correctly fix without emulating the existence of an L2 cache. We eventually resorted to patching the game to fix the bug.
Or games where video decoding stutters, because it has a hot inner loop that push the out-of-order CPU and has very few cache misses. It executes faster over the whole frame on real hardware than Dolphin's CPU timing model, which assumes a certain number of cache misses. The game must have tuned it's video codec to use as much CPU time as possible.
We have speed running tricks that don't work in Dolphin, because the depend on lagging the game. And games that freak out when the GPU executes too fast, but when you adjust the timings for those other games freak out because the GPU is executing too slow. It's impossible to calculate accurate GPU timings without running much of vertex transform and a basic depth rasterizer.
These are projects I'd love to work on at some point, accurate CPU and GPU timings for Dolphin, even if they don't run at full speed and bus contention is still ignored. I think might be possible to get within the correct order of magnitude (so 10-50% of realtime), which is workable for some usecases like TASes and testing bugs.
> I'm of the belief that hardware is not truly preserved until you have a fully-accurate emulation of it.
This makes sense in principle, but exact emulation is something computationally prohibitive even for a (probably) 386¹. The computational problems of exact emulation have been described in a famous article about emulating the SNES².
I suppose that emulating even "just" a superscalar architecture is going to be prohibitive (due to the split into micro ops), and an out-of-order one would probably require transistor-level emulation (or at least, another, lower, level of emulation).
¹=Fairly arbitrary; I'm basing this just on the complexity of emulating the SNES, and the following considerations.
There is SMP P3 emulation. qemu is an emulator, not a virtualizer.
There is no cycle-accurate emulation, but again the question is: why would need cycle-accurate emulation to run win2k? It runs just fine on virtualizers, even.
But I don't want virtualization, I want cycle accurate emulation of my p3, actually, I want emulation of the underlying chip so that I can run the original Intel microcode on it. I want my bios chips and all that jazz. I want a ship in a bottle, complete with the seven seas and the world to travel. I want it described in software, so it can be emulated, on whatever the future brings, forever.
Probably no software was ever written specifically for the Pentium 3 (so that it would break on a Pentium 4, or different Pentium 3), so there isn't much reason to emulate it too accurately.
What you are asking is FPGA territory anyway, maybe not even that, the Pentium 3 is too complex to accurately emulate.
>Here's the discussion from reddit. It appears as though the developer is loudly promising PIII emulation but has a history of not delivering.
The developer of 86Box as a rule has a history of backing up their claims -hence 86Box which has gone from strength to strength in the years since being forced to fork from pcem.
A *LOT* of people float into the project's discord and ask "pentium III when?" -this addresses that, as well as answering the overly ambitious promises made by a recent fork of 86box.
Frankly, despite being in r/emulation, most of these people seem like they're new to the emulation scene with how dismissive they're being about this statement. It's as if they're unaware of just how much drama can happen behind the scenes and how it's sometimes useful to just clearly and unambiguously state "this is what happened from our perspective."
Having the wrong scope can be very detrimental to a project in general, and although I'm not an emulator developer I feel this is probably especially true in this case.
They need to stop adding features and start completing features. Get accurate instruction cycle counts for all the CPUs up to and including the Pentium Pro. Be more clear on what is happening when you disable cache in the bios for platforms that allow this. Just saying that it reverts back to interpreter mode seems false. It allows me to run very fast clock CPUs with no slowdown. It's doing something different than just disabling the dynrec. I still just use PCem every day instead though. I trust that code way more than 86box mainly because of the machine window with the history graph and seemingly way more reproducible behavior in terms of performance. 86box is so wonky.
Socket 8 Pentium pro 200 is just universally great for everything. Disable cache and move down to 150 and you can run older stuff quite well with no hyper speed. I have a bunch of old system presets I use every day. I got a win98se setup that is usually socket 7, super 7 or socket 8 for most stuff. It has 4 55GB VHD drives mounted to it. It's great. Every DOS game is just there. Bunch of Windows software ISOs too. Got a bunch of 2gb dos compatible disks and every DOS version installed and ready to go. I'll toy around with all the OS/2s sometimes too, very cool system.
Article claims it’s not possible to emulate a Pentium III at full speed on an Apple M1, and so there’s no point to a fork attempting this feat. But isn’t that exactly what Rosetta 2 does?
The original Rosetta software provided during the PowerPC to Intel x86 transition was an instruction and system call translation layer. I haven't used Rosetta 2 but it's probably similar.
As I understand 86Box is emulating specific chip implementations and all of the peripherals, cards and boards needed to run them, not just translating instructions. This makes the emulated machines appear to the guest OS as real hardware, I assume with all the flaws and quirks included.
Rosetta 2 only needs to emulate a macOS specific subset of x86-64 user mode. This needs to emulate kernel mode, including paging and the guest OS hosting its own user mode.
Arguably, cycle-accurate emulation should be done with FPGA support anyway. The Pentium III is old enough that there should be no legal obstacle to implementing new compatible hardware.
The problem won't be legal, but scale: the Pentium 3 has ~9.5M transistors [1] - and the world's largest FPGA, at a staggering price point of 55.000 dollars, has only 9M logic gates [2].
Probably you could use an FPGA a bit smaller if you use FPGA-native memory for L1-L3 instead of transistors, but you'll nevertheless need an FPGA that is many orders of magnitude too expensive for a hobbyist niche project. Also, you would need serious quality photographs of the die... which can be done for old and small chips on a budget, but will be really expensive even for something of the Pentium 3 era.
What has always interested me though: how are CPUs and modern GPUs actually developed, given that FPGAs are way, way too small and silicon-making runs extremely expensive?
Logic Gate != Transistor and FPGA LUT != Logic Gate
FPGA LUTs are quite sophisticated, the LUTs in both Xilinx and Intel's latest offerings are able to implement an arbitrary boolean logic function with up to 6 inputs and 1 output. The FPGA also comes packed with other specialized hardware such as on chip memory and multipliers so you do not need to burn logic to use those things. You very likely could fit a Pentium III on a moderately large FPGA.
The real challenge would be matching the clock speed of the original processor. Even 400MHz of the first pentium IIIs on the market might be difficult, getting near 1GHz is likely impossible.
I'd love to know who uses a 9M LUT FPGA! The article mentions companies like Arm testing their designs, but do they really test on FPGAs? (And would such a test be useful? ASICs and FPGAs have fairly different timing because of the length of wires).
There are some x86 clones embedded in "PC in an ethernet port" style parts for upgrading old industrial designs. IIRC clones do things like a 486 clone clocked at a few hundred MHz.
Completely OT, but switching to a Firefox tab with TFA loaded takes noticeably longer than switching to other tabs (about 3 seconds vs less than a second).
I don't understand,the yuzu (switch) emulator has excellent performance and correctness. It perfectly emulate a modern ARM cpus at multiple GHz and properly emulate the Tegra Nvidia GPU.
Why would emulating x86 be so much harder, especially since you can actually use the host x86 cpu... via hardware hyoervisors e.g.
https://github.com/intel/haxm
Yuzu uses Unicorn, which uses QEMU code for functional, but not cycle accurate CPU emulation. Most software these days don't depend on the exact behavior of the hardware, even on Nintendo Switch, and the emulator can also use application-specific patches to make up for the difference.
86Box is not entirely cycle accurate, but much more accurate than QEMU, so no patches are required.
gem5 is on the other end of the spectrum, it can emulate hardware very accurately, but very slowly.
Well that makes pcem and x86box very niche then.. What should people actually use for emulating a recent x86 cpu like found on the PS4/PS5/xbox serie x and get good performance? Because this is where the human resources should be allocated, for enabling the support of hundreds of video games including major console exclusives
Should does not equal must. Wanting a better world with less suffering and more joy is not the same thing as imposing this world by force. A thought that should have occured to you.
Emulating the CPU of those newer consoles would be quite slow. The practical approach would be a translation layer like Wine with many application-specific patches. The translation layer for the graphics would involve a lot of reverse engineering.
With old hardware, the sort I imagine 86Box is mainly used to emulate (it's how I use it, anyway), old OSes rely on specific behavior of old systems and peripherals that can't be easily virtualized through KVM + QEMU or the like. A mixture of processors being too fast and behaving subtly different, possibly due to differences in "undefined behavior".
I doubt there's a binary cutoff point where it's obviously more useful to virtualize or emulate, but it's obvious that emulating an 80386 is ideal, and virtualizing a "Core 2 Duo" is ideal, for running software from those respective periods. Is the Pentium III desirable to emulate, and simply not computationally feasible, or is this just people wanting to make cool things for cool things' sakes? Or a little of both?