Rendered at 22:05:54 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
rafaquintanilha 2 hours ago [-]
I have no affiliation with them but here's what I think happened:
1. They claim the official model is based on Qwen 397B. It's likely they didn't disclose Nex Pro at all because Nex itself is based on the same base model (not saying they shouldn't).
2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.
3. It's important to notice they didn't advertise the model besides posting it on Reddit 2 days ago. It became viral organically, over the weekend, and during Brazil's World Cup debut (Brazilians will understand). Of course the mayor of Rio took the opportunity to capitalize over the free coverage, but that wasn't done in conjunction with the researchers.
4. I don't see why they would disclose Qwen 397B as base and mention the SwiReasoning paper but not mention Nex if all they did was to merge both models.
5. In any case, what they are claiming is easily verifiable once (if) they upload the right model.
I'm honestly impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is probably the last headline I ever expected to read on HN.
cscheid 28 minutes ago [-]
Yes! That "prefeitura do Rio" huggingface URL is definitely shocking to read to this Brazilian as well (I'm assuming you and parent also are from your usernames).
hintymad 4 hours ago [-]
> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen — across all 60 layers and every component of the network. Other finetunes cannot be explained as interpolations.
I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.
Aurornis 3 hours ago [-]
> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.
Enhanced it on a couple benchmarks, supposedly.
The game is to turn knobs until you get a benchmark run that shows an improvement, then ship it. There are a lot of fine tunes and chimera models on HuggingFace that are supposedly better at some specific test, but when you use them for anything else they're usually worse.
This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases.
andai 3 hours ago [-]
They seem to have deleted most of the README now, but the archived version has benchmarks.
Rio seems to be about halfway between Qwen 3.5 and Nex, as you'd expect?
manquer 30 minutes ago [-]
> game is to turn knobs until you get a benchmark run that shows an improvement, then ship it
i.e reinforcement learning against a weak reward function - benchmark is insufficiently complex and is not representative of the real world sufficiently.
The "game", i.e. decision tree can be modeled as a multi-arm bandit problem, to deploy finite resources ( compute) toward exploitation/exploration .
The main issue is each training / fine-tune is very expensive so number of chances at the slot so to speak is pretty limited today.
I don't believe this would work on two LLMs that have different pretraining. Even if it did you would need two LLMs that have exact same internal activation shapes, dimensions, expert counts, token vocabulary, realistically it would never happen outside of finetunes or academic experiments.
hashmap 2 hours ago [-]
not this exact thing, no, because the functional circuits dont appear in the same places across models. but if you find where they are you can do something like branch between some of the middle functional circuits between models and it kinda just works, or even do one after the other. you cant just like swap any two layers cause a bunch of em bend hyperbolic curvature to do hierarchical stuff deep in the poincare ball and the geometries get all bonkers, but before and after they do that things are relatively flat, and the geometries are more or less transferrable up to rigid rotation if they're each trained on large enough data.
oofbey 2 hours ago [-]
Correct. We used to think that because NN optimization is non-convex there are all these local minima. Now we know that once you get past the very early parts of training from random init, the loss surface is fairly smooth, and not really convex, but close enough in a bunch of ways - linear combinations of trained models are pretty much always valid combinations. You can think of fine tunings as deltas on the original model which can be summed together successfully. I think this paper first showed that to me: https://arxiv.org/pdf/1802.10026 which was 8 years ago now.
woadwarrior01 4 hours ago [-]
It's is a well known idea[1], although it's still surprising that something as simple, even works.
> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.
Which could be a signal that your "performance" was so abysmal in the first place that even randomly applied training methods can't make it _worse_.
randall 3 hours ago [-]
[dead]
meindnoch 3 hours ago [-]
It shows that LLMs are an extremely wasteful approach to intelligence.
kristjansson 3 hours ago [-]
or that intelligence is merely the composition of many redundant, lossy, ~random components
3 hours ago [-]
unrvl22 6 hours ago [-]
The municipality of Rio de Janeiro (via its IT company IplanRIO) released Rio-3.5-Open-397B, presented as a homegrown Qwen3.5 fine-tune that beats comparable open models on benchmarks. The linked issue argues it's actually a weighted merge of ~60% Nex-N2 Pro + ~40% Qwen3.5-397B-A17B - Nex-N2 having been released about a week earlier.
DonsDiscountGas 5 hours ago [-]
I didn't know model merging like that was possible. (Obviously possible from a pure software standpoint but I'm surprised it's effective)
it works because Nex N2 is also a derivative of the original base Qwen model. If it was two completely unrelated models it wouldn't work.
Lucasoato 5 hours ago [-]
So the problem isn’t in the missing attribution to Qwen, but with the fact that they didn’t mention Nex-N2 Pro right?
Aurornis 5 hours ago [-]
The problem is that they claimed to have made a big achievement with their home grown post training, and they expected to receive a lot of praise for it.
Then researchers looked at the weights and there is no post training at all.
They are now attributing both models they merged, but their excuse for the lack of post training is to claim they accidentally uploaded the wrong files.
serial_dev 3 hours ago [-]
I’d believe they accidentally uploaded the wrong files if they uploaded the correct ones. To state that they accidentally uploaded something else and then not upload the correct version means they probably do not have anything and either hope people forget about this or they are scrambling to have something that is at least close to their original claim.
6 hours ago [-]
clear-octopus 5 hours ago [-]
[dead]
zinodaur 5 hours ago [-]
Oh no, someone is profiting off of their work without proper attribution!?!?
Aurornis 5 hours ago [-]
This is an open weights model based on other open weights models.
The dispute is that they released it with claims about having done some post training that improved the outputs. It was discovered that the model was not post trained like they claimed.
The HF page now says it’s a merge of models, which wasn’t there before. They’re trying to claim they accidentally uploaded the wrong model to HF and that they’ll upload the real one soon.
Basically, they thought they could splice two open weights models together and claim their team had accomplished some amazing post training, but they weren’t smart enough to realize that other researchers would discover that there wasn’t any post training.
moritzwarhier 4 hours ago [-]
Thanks for the factual clarification. This is so important when everyone already has their trigger finger on politics. Not meaning that politics are irrelevant here, see sister comment by jobim.
But it's impossible to form a nuanced opinion when political association has a higher priority than the facts; which, again, don't look flattering for the implementers.
iknowstuff 4 hours ago [-]
How do they just splice two models together?
Aurornis 4 hours ago [-]
The Nex N2 model they merged is based on Qwen 3.5, so you can swap pieces of one into the other. They found a combination of the two that did well on some benchmarks and shipped it.
In the early days of Llama there were a lot of experiments like this. There were even some interesting combinations of models where they stacked layers of different models together or even added more layers with interesting results.
But announcing that you spliced two models together isn't very impressive in 2026, so they announced that they had done their own post training and outdid the big labs. They thought nobody would look close enough to notice.
ninja3925 4 hours ago [-]
Out of curiosity, how was it discovered? You would have to look for it to find this linear combination.
jdiff 3 hours ago [-]
Without the system prompt, asking its name results in it responding with the name of the model they're ripping from. That would certainly draw your eyes to the right places.
valleyer 3 hours ago [-]
Why is this? Do labs reinforce the model name during training? I was under the impression that this sort of "self-knowledge" always came from the system prompt, but I guess not...
jdiff 2 hours ago [-]
Yes. In this case, during fine tuning. Other blurbs are also baked in during fine tuning that are perfectly reproducible from the Nex model. The details inside the linked issue are quite accessible.
Aurornis 4 hours ago [-]
Check the linked GitHub issue. They explain their process.
Scroll past the first issue to find it. It’s further down.
internet2000 5 hours ago [-]
Attribution isn't the relevant part. Lying about your lab's capabilities is.
Planktonne 5 hours ago [-]
That's also something all the AI companies have been doing.
dofm 5 hours ago [-]
Lying about model capability is right now the lingua franca of the cloud AI business model, almost; they yes-and each other's lies because they are in a position of needing to generate interest, including going as far as needing to trigger regulatory capture.
(It's not news to anyone who has worked in sales-led businesses that salespeople are prone to believing the claims of other salespeople, I guess).
low_tech_love 3 hours ago [-]
They’re using public money to “train” this.
vips7L 3 hours ago [-]
Sounds like the whole AI movement.
themafia 3 hours ago [-]
It seems to me like the lies are both for the same reason. To capture attention and profits that are not deserved.
outside2344 5 hours ago [-]
But the whole game is lying and stealing isn't it?
functionmouse 5 hours ago [-]
leopards ate my face
adrian_b 5 hours ago [-]
I do not see anyone lying.
The model card says:
> Post-trained from Qwen 3.5 397B
The model card also says that they use an inference framework based on "SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs" by Shi et al.:
They only claim that what they did to "Qwen 3.5 397B" has improved the LLM, including, as expected, with "strong performance in Portuguese".
petu 5 hours ago [-]
That's attribution to Qwen team.
There (is/was) no attribution to Nex team (they've released a model based on Qwen 3.5 397B as well).
As per OP link Nex claims that what Rio team released (so far) is just linear interpolation of weights between Nex and OG Qwen model. With no attribution to Nex and zero signs of Rio doing any training of their own.
4 hours ago [-]
00index 5 hours ago [-]
Are you talking about the credit that was just updated an hour ago? lol
5 hours ago [-]
carlosjobim 5 hours ago [-]
This is a pure scam on tax payer money. But what else would be expected?
hootz 3 hours ago [-]
Apparently no public money was involved.
jdiff 3 hours ago [-]
This is contrary to the mayor's words on Twitter.
> An open AI model trained in Rio with public funding over the last year by @Prefeitura_Rio surpassing all other models.
Unlike the big companies who do this, which often are merely impure scams on tax payer money a little more downstream.
philipallstar 4 hours ago [-]
Companies that generate loads of corporation tax, income tax, and VAT revenue are the exact opposite of wastes of public money.
jrm4 3 hours ago [-]
Yes, when they do so proportional to what they take, especially as compared to individuals and their tax liabilities.
You'll have to let me know when that finally happens, because that ain't now.
carlosjobim 5 hours ago [-]
Great, now we're defending embezzlement and fraud with public funds on HN, because we really really hate big business.
A child caught doing something bad will cry "but my friends also did it!", is that the level of reasoning hackers want to be at?
blanched 4 hours ago [-]
That seems like a bad faith read to me. Nobody is defending it, just pointing out the irony / hypocrisy. Two things can be bad, and they can be related.
sdevonoes 4 hours ago [-]
There are no hackers around here anymore. HN is mainly about business nowadays
dmix 4 hours ago [-]
HN has always discussed business
jrm4 4 hours ago [-]
What part of that said "defense?"
They can both be bad.
lostlogin 4 hours ago [-]
> Great, now we're defending embezzlement
I might be missing something, but I don’t see anyone defending the the scams.
clear-octopus 5 hours ago [-]
[dead]
5 hours ago [-]
bachmeier 5 hours ago [-]
"Their work"? First you had the original content creators that did 99.99% of the work. Then you had the US companies bundle it up into a frontier LLM. Then "they" did the "work" of using the US model as a foundation for their own. So in the sense of doing 0.00001% of the actual work that went into their product, sure.
I'd say it's more like someone forking a Linux distro, adding a few themes and fonts, and then complaining when someone else forks their distro and adds another theme.
dghlsakjg 5 hours ago [-]
That’s the joke.
bachmeier 3 hours ago [-]
It isn't. The entirety of the comment I responded to is "Oh no, someone is profiting off of their work without proper attribution!?!?" It's a valid point, but references someone using content created by others for profit. I'm objecting to equating this project with the work done by the original content creators. They're not remotely the same thing.
I understand how the internet works and how people respond to others in this type of setting, but the comment I replied to did not in any way make the point I was making about the disproportionate nature of relative contributions.
idiotsecant 15 minutes ago [-]
It's time to stop digging
bwilliams18 5 hours ago [-]
That was the joke of the parent comment.
JoshStrobl 5 hours ago [-]
That joke really went over your head, huh...
harikb 5 hours ago [-]
It is only a problem if you claim it to be an independently developed OS with no attribution to base
idiotsecant 5 hours ago [-]
Oof this is delete your post level I think. Sorry bud, I been there.
jordz 4 hours ago [-]
Can someone please explain or link to some information about how models are merged? Is this genuinely merging weights mathematically or some kind of distillation (presumably not if they’ve done zero training as the post suggests).
But yes, in general, merging refers to techniques that directly blend the weights of different models mathematically. It had a big moment of popularity ~2 years ago, with many so-called "Frankenmodels" popping up on leaderboards.
I tend to think of merging as belonging to the same general umbrella as things like "abliteration", or other techniques that surgically modify the weights of a model without a traditional training/tuning loop. Maxime Labonne is a great person to follow if you're interested in this general area.
3 hours ago [-]
4 hours ago [-]
4 hours ago [-]
jrm4 5 hours ago [-]
“Well, Steve (Jobs), I think it’s more like we both had this rich neighbor named Xerox, and I broke into his house to steal the TV set, but I found out that you had already stolen it.”
-- Bill Gates
ckcheng 4 hours ago [-]
What’s more funny to me is the set up to that quote:
> Bill Gates had somehow manifested, alone, surrounded by ten Apple employees. … Steve started yelling at Bill, asking him why he violated their agreement.
And what’s more interesting is the conclusion:
> Apple filed a monumental copyright lawsuit against Microsoft in 1988, but they eventually lost on a technicality (the judge ruled that Apple inadvertently gave Microsoft a perpetual license to the Mac user interface in November 1985).
Microsoft didn’t steal Apple’s GUI … Apple gave it to them.
alexgoodhart 3 hours ago [-]
That isn’t fully true is it?
Microsoft claimed that its software’s use of various visualizations related to window state was covered by the 1985 agreement, and Apple claimed that this was not true; those window states were produced by Macintosh while Microsoft’s software was being rendered in the Mac environment.
> In his March 20, 1989 Order, Judge Schwarzer declined to consider whether the visual displays in issue were generated by the Microsoft application programs or by the Macintosh system software. The point arose in connection with Microsoft's argument that the 1985 Agreement licensed to Microsoft all visual displays that could possibly be called up by running the five Microsoft application programs on the Macintosh system software then or in the future. 709 F. Supp. at 929. Judge Schwarzer concluded that Microsoft's contention would "defy common sense." Id.
themafia 3 hours ago [-]
Two spoiled rich kids arguing over who's morality is the least worst.
That this moment is held up as some great exchange in business is annoying. That our regulatory agencies are perennially sleep at the switch and allow this nonsense to keep happening is extremely frustrating.
ChrisClark 2 hours ago [-]
Held up as some great exchange? No it's two assholes arguing with each other. Just like most Jobs documentaries show him as a terrible person.
Scroll_Swe 2 hours ago [-]
[flagged]
themafia 2 hours ago [-]
Let me guess, when confronted with uncomfortable information that requires you to think longer than you are used to, you devolve to false dichotomies into defend your ego?
wunderlotus 4 hours ago [-]
lmao i really hope this is a real quote cuz it’s a banger
>The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.
Incidentally are people using Github issues as blogs now?
jonchurch_ 4 hours ago [-]
Edit: I didnt even notice until someone pointed out this was on the Nex-n2 repo not the rio one, now I understand the OP’s confusion!
It wasnt framed as an issue which is the norm breakage I think you’re reacting to, as in they didnt ask that the readme be updated etc, but it is common now for folks to use a project’s issue tracker to name and shame them in a place they cant easily ignore.
Whether that’s right, prosocial, or professional is up for debate (as well as if any single definition of etiquette can be expected in 2026 on an issue tracker).
But surely you can see the optics reason why someone would take their complaint to the repo directly? It pressures the maintainers to respond, it allows for a pile on from the internet, and makes any decision to lock down a hostile thread into its own kind of statement.
The maintainers should absolutely post an official response and lock the thread though, it will likely get ugly in there.
ChoosesBarbecue 4 hours ago [-]
But this is posted on Nex's GitHub, not on "Rio de Janeiro's" GitHub.
i.e. this is the maintainer posting on their own GitHub Issues.
fkozlowski 5 hours ago [-]
I'm honestly surprised that they even had the inclination to attempt creating a model. I guess it's bullish that a municipal IT department had the guts to try this?
Havoc 4 hours ago [-]
Merges and fine tunes are within reach of individuals with some money to burn so I’m sure a muni can do it
axus 3 hours ago [-]
I like the [dead] comment theory that they proposed a huge LLM training budget to the government, kept most of the money, and released a cheap merge to justify the grift.
seba_dos1 1 hours ago [-]
It's kinda weird to claim extraordinary results in such case though, as that brings a lot of eyes to it.
mgambati 10 minutes ago [-]
Nothing weird. The mayor wanted something brag about. That Rio, my friend.
matheusmoreira 1 hours ago [-]
That's essentially Brazil's standard operating procedure. Wouldn't be surprising if that turned out to be the case.
Still, I'm actually impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is the last headline I expected to read on HN.
MadrasTh0rn 5 hours ago [-]
Not surprised
nom 4 hours ago [-]
why not?
diego_moita 4 hours ago [-]
It is a recurrent Brazilian meme: Rio is known in Brazil as "terra de bandido" (gangster's land).
The majority of their politicians have ties to organized crime. There is a virtual revolving door between police and crime, where people migrate from one to the other.
It is like Chicago in the 20s, Naples and Medelin in the 80s or Moscow and Culiacan (Sinaloa, Mexico) today.
alexgoodhart 3 hours ago [-]
Somehow I doubt that political affiliations with crime syndicates are affecting heavily the dispositions of LLM developers. The industry itself though is one of incest.
sebastianconcpt 10 minutes ago [-]
Politicians don't come from outer space, they emerge locally and were raised swimming in an imaginary that has normalized the morals that eventually end up expressed at the top.
afh1 44 minutes ago [-]
He is putting into question the character of the public workers involved in the project, not that it has anything to do with organized crime. Rio has relapsed into crime in the last decades and government workers in general have a reputation for corruption in Brazil. It's a low trust society specially north of Parana hence the lack of surprise.
ekjhgkejhgk 5 hours ago [-]
One funny thing about incompetence is that they don't have the competence to know that their incompetence is straightforward to verify by a competent person.
root-parent 5 hours ago [-]
You just described every single vibe coder...
vvpan 2 hours ago [-]
I think that's unfair to "vibe coding". If anybody explicitly claims to vibe coding something than they are admitting to low supervision of the code. And on the contrary you can also AI-produce code that you have supervised highly. I suppose there are people who both AI their code and push it as bespoke but I, for one, have not met such a person at our outside of work.
root-parent 1 hours ago [-]
>> but I, for one, have not met such a person at our outside of work.
I wouldn’t describe what happened here as incompetence. As a “carioca”, I am pleasantly surprised to know that the government’s IT department is involved in AI work — even without the budget to create its own models from scratch.
reese_john 2 hours ago [-]
It is a testament to the bloat and overreach of the Brazilian state in the economy. Such endeavors should be left to the private sector
arcticfox 5 hours ago [-]
This seems kind of insane though, every time I go to Rio I think of the potential of AI/technology to solve some problems and leave it even more paradisiacal... But working on their own model? Wtf? There are a million applications of existing ones there that should be followed up on instead.
carlosjobim 5 hours ago [-]
Why would they care? They get their salaries and pensions and bonuses, and the tax payer is footing the bill.
AnotherGoodName 6 hours ago [-]
This is fascinating that it worked though. Can we just merge all the open weight models and get something better?
wds 5 hours ago [-]
I imagine it'd work the same as merging all the good-tasting foods to get an even tastier one
nylonstrung 4 hours ago [-]
If you go to Civitai this is pretty how it works in that corner of the image generation world
Everything is using Stable Diffusion as underlying model, then most of the usage is merged of checkpoints
avereveard 5 hours ago [-]
most merge improve a small subset of "feeling" benchmark (too small, too specific, or out of distribution) and tend to show degradation on actual benchmark, with especially punishing result on long chain benchmarks.
also only work on matching architectures (i.e. finetunes/loras of the same model)
dindunuf 5 hours ago [-]
that kinda worked in llama 1/2 era, not between different models but between finetunes of the same model. the briefly legendary Mythomax was IIRC a merge of 5+ tunes, some of which were merges themselves.
_3u10 5 hours ago [-]
No, they need the same arch, but you can distill them into a single model. And yes, if you use the API directly Claude will often say it’s an open weight model (likely the ones it was distilled from)
delusional 3 hours ago [-]
It's absolutely insane to me that we are now at a point where the top of the front page of hacker news is a random GitHub issue about attribution to some random LLM merge, written in just the most disgusting AI slop style.
I would like to downvote this please.
yieldcrv 5 hours ago [-]
Didn’t the last thread about this have someone from the lab or an enthusiast in Rio saying exactly that?
Its a fine tune of Qwen
Not a conspiracy
daemonologist 5 hours ago [-]
The allegation here is that it's not actually a fine-tune of Qwen, but instead an undisclosed mashup (merge) of someone else's fine-tune of Qwen and the original model. Rio subsequently said that the model was in fact a merge, that they did additional fine-tuning after the merge, and that they accidentally uploaded the base merge instead of the version with additional fine-tuning. But this seems like quite an oversight...
yieldcrv 3 hours ago [-]
> But this seems like quite an oversight...
Not to me, what would people like to happen? Who are those people? And why do they care?
flowbarai 3 hours ago [-]
[flagged]
Aurornis 5 hours ago [-]
[dead]
antii 5 hours ago [-]
[dead]
diego_moita 4 hours ago [-]
WHAT!? There are thieves in Rio de Janeiro?
Oh, I am so SHOCKED, so SHOCKED! /s
Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de bandido" (Gangster's Land).
Kinda like Chicago in the 20's or Naples and Palermo in the 90s.
elzbardico 6 hours ago [-]
[flagged]
guiraldelli 5 hours ago [-]
Without evidence, your comment is just bad mouthing.
I have been involved in academia, including in Brazil, and I don't find academia there any more copycat than any other institution, including top tier ones.
boca_honey 5 hours ago [-]
This is very easy to prove [1][2]. Brazil has that reputation in the broarder academic world, and it's for a reason.
One study about faculty hiring people they know, and the other about high school students cheating on assignments...
What was the original claim again?
dghlsakjg 5 hours ago [-]
This was a municipality working with a government associated IT company.
What does it have to do with Brazilian academia?
_3u10 5 hours ago [-]
No, typically Brazilians go to Paraguay for their education, most of their technology comes from Paraguay too.
matheusmoreira 2 hours ago [-]
No. We go to Paraguay to buy cheaper electronics.
knuppar 5 hours ago [-]
that's just a lie lol, stop spreading misinformation
cassiogo 5 hours ago [-]
What? Never heard of this
stymaar 5 hours ago [-]
That sounds like nonsense, they don't even speak the same language in Brasil and Paraguay …
Scroll_Swe 3 hours ago [-]
What else have South America overall contributed to?
Finland, a country invaded by both Nazis and Soviets and bombed to hell, made the Linux kernel and Nokia btw.
vvpan 2 hours ago [-]
Gross historical ovesimplifications aside I am wondering why one would use this as an opportunity to belittle a whole continent.
pelasaco 2 hours ago [-]
an eternal 7x1.. and I am not talking about Curaçao..
alfiedotwtf 5 hours ago [-]
Wasn’t it already obvious given the awfully familiar parameter numbers?
intoXbox 4 hours ago [-]
That only tells what base architecture they used, but fine tuning does not increase the number of weights, it just adapts the weights to improve better on a fine tuning dataset- something they claimed they had done
Havoc 4 hours ago [-]
Nex in turn is also based on qwen so don’t think they’re too far off
1. They claim the official model is based on Qwen 397B. It's likely they didn't disclose Nex Pro at all because Nex itself is based on the same base model (not saying they shouldn't).
2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.
3. It's important to notice they didn't advertise the model besides posting it on Reddit 2 days ago. It became viral organically, over the weekend, and during Brazil's World Cup debut (Brazilians will understand). Of course the mayor of Rio took the opportunity to capitalize over the free coverage, but that wasn't done in conjunction with the researchers.
4. I don't see why they would disclose Qwen 397B as base and mention the SwiReasoning paper but not mention Nex if all they did was to merge both models.
5. In any case, what they are claiming is easily verifiable once (if) they upload the right model.
https://news.ycombinator.com/item?id=48529544
I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.
Enhanced it on a couple benchmarks, supposedly.
The game is to turn knobs until you get a benchmark run that shows an improvement, then ship it. There are a lot of fine tunes and chimera models on HuggingFace that are supposedly better at some specific test, but when you use them for anything else they're usually worse.
This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases.
https://web.archive.org/web/20260614082641/https://huggingfa...
And the Nex benchmarks for comparison
https://huggingface.co/nex-agi/Nex-N2-Pro
Rio seems to be about halfway between Qwen 3.5 and Nex, as you'd expect?
i.e reinforcement learning against a weak reward function - benchmark is insufficiently complex and is not representative of the real world sufficiently.
The "game", i.e. decision tree can be modeled as a multi-arm bandit problem, to deploy finite resources ( compute) toward exploitation/exploration .
The main issue is each training / fine-tune is very expensive so number of chances at the slot so to speak is pretty limited today.
I don't believe this would work on two LLMs that have different pretraining. Even if it did you would need two LLMs that have exact same internal activation shapes, dimensions, expert counts, token vocabulary, realistically it would never happen outside of finetunes or academic experiments.
[1]: https://arxiv.org/abs/2203.05482
Which could be a signal that your "performance" was so abysmal in the first place that even randomly applied training methods can't make it _worse_.
Then researchers looked at the weights and there is no post training at all.
They are now attributing both models they merged, but their excuse for the lack of post training is to claim they accidentally uploaded the wrong files.
The dispute is that they released it with claims about having done some post training that improved the outputs. It was discovered that the model was not post trained like they claimed.
The HF page now says it’s a merge of models, which wasn’t there before. They’re trying to claim they accidentally uploaded the wrong model to HF and that they’ll upload the real one soon.
Basically, they thought they could splice two open weights models together and claim their team had accomplished some amazing post training, but they weren’t smart enough to realize that other researchers would discover that there wasn’t any post training.
But it's impossible to form a nuanced opinion when political association has a higher priority than the facts; which, again, don't look flattering for the implementers.
In the early days of Llama there were a lot of experiments like this. There were even some interesting combinations of models where they stacked layers of different models together or even added more layers with interesting results.
But announcing that you spliced two models together isn't very impressive in 2026, so they announced that they had done their own post training and outdid the big labs. They thought nobody would look close enough to notice.
Scroll past the first issue to find it. It’s further down.
(It's not news to anyone who has worked in sales-led businesses that salespeople are prone to believing the claims of other salespeople, I guess).
The model card says:
> Post-trained from Qwen 3.5 397B
The model card also says that they use an inference framework based on "SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs" by Shi et al.:
https://arxiv.org/abs/2510.05069
So the sources seem properly attributed.
They only claim that what they did to "Qwen 3.5 397B" has improved the LLM, including, as expected, with "strong performance in Portuguese".
There (is/was) no attribution to Nex team (they've released a model based on Qwen 3.5 397B as well).
As per OP link Nex claims that what Rio team released (so far) is just linear interpolation of weights between Nex and OG Qwen model. With no attribution to Nex and zero signs of Rio doing any training of their own.
> An open AI model trained in Rio with public funding over the last year by @Prefeitura_Rio surpassing all other models.
https://x.com/CavaliereRio/status/2065984620626129026
You'll have to let me know when that finally happens, because that ain't now.
A child caught doing something bad will cry "but my friends also did it!", is that the level of reasoning hackers want to be at?
They can both be bad.
I might be missing something, but I don’t see anyone defending the the scams.
I'd say it's more like someone forking a Linux distro, adding a few themes and fonts, and then complaining when someone else forks their distro and adds another theme.
I understand how the internet works and how people respond to others in this type of setting, but the comment I replied to did not in any way make the point I was making about the disproportionate nature of relative contributions.
But yes, in general, merging refers to techniques that directly blend the weights of different models mathematically. It had a big moment of popularity ~2 years ago, with many so-called "Frankenmodels" popping up on leaderboards.
I tend to think of merging as belonging to the same general umbrella as things like "abliteration", or other techniques that surgically modify the weights of a model without a traditional training/tuning loop. Maxime Labonne is a great person to follow if you're interested in this general area.
-- Bill Gates
> Bill Gates had somehow manifested, alone, surrounded by ten Apple employees. … Steve started yelling at Bill, asking him why he violated their agreement.
And what’s more interesting is the conclusion:
> Apple filed a monumental copyright lawsuit against Microsoft in 1988, but they eventually lost on a technicality (the judge ruled that Apple inadvertently gave Microsoft a perpetual license to the Mac user interface in November 1985).
Microsoft didn’t steal Apple’s GUI … Apple gave it to them.
Microsoft claimed that its software’s use of various visualizations related to window state was covered by the 1985 agreement, and Apple claimed that this was not true; those window states were produced by Macintosh while Microsoft’s software was being rendered in the Mac environment.
> In his March 20, 1989 Order, Judge Schwarzer declined to consider whether the visual displays in issue were generated by the Microsoft application programs or by the Macintosh system software. The point arose in connection with Microsoft's argument that the 1985 Agreement licensed to Microsoft all visual displays that could possibly be called up by running the five Microsoft application programs on the Macintosh system software then or in the future. 709 F. Supp. at 929. Judge Schwarzer concluded that Microsoft's contention would "defy common sense." Id.
That this moment is held up as some great exchange in business is annoying. That our regulatory agencies are perennially sleep at the switch and allow this nonsense to keep happening is extremely frustrating.
https://www.folklore.org/A_Rich_Neighbor_Named_Xerox.html
>The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.
Incidentally are people using Github issues as blogs now?
It wasnt framed as an issue which is the norm breakage I think you’re reacting to, as in they didnt ask that the readme be updated etc, but it is common now for folks to use a project’s issue tracker to name and shame them in a place they cant easily ignore.
Whether that’s right, prosocial, or professional is up for debate (as well as if any single definition of etiquette can be expected in 2026 on an issue tracker).
But surely you can see the optics reason why someone would take their complaint to the repo directly? It pressures the maintainers to respond, it allows for a pile on from the internet, and makes any decision to lock down a hostile thread into its own kind of statement.
The maintainers should absolutely post an official response and lock the thread though, it will likely get ugly in there.
i.e. this is the maintainer posting on their own GitHub Issues.
Still, I'm actually impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is the last headline I expected to read on HN.
The majority of their politicians have ties to organized crime. There is a virtual revolving door between police and crime, where people migrate from one to the other.
It is like Chicago in the 20s, Naples and Medelin in the 80s or Moscow and Culiacan (Sinaloa, Mexico) today.
https://news.ycombinator.com/item?id=48516679
Everything is using Stable Diffusion as underlying model, then most of the usage is merged of checkpoints
also only work on matching architectures (i.e. finetunes/loras of the same model)
I would like to downvote this please.
Its a fine tune of Qwen
Not a conspiracy
Not to me, what would people like to happen? Who are those people? And why do they care?
Oh, I am so SHOCKED, so SHOCKED! /s
Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de bandido" (Gangster's Land).
Kinda like Chicago in the 20's or Naples and Palermo in the 90s.
I have been involved in academia, including in Brazil, and I don't find academia there any more copycat than any other institution, including top tier ones.
[1] https://www.sciencedirect.com/science/article/abs/pii/S17511...
[2] https://www.scielo.br/j/aac/a/xNytDrrrHdyK4XPcHBRJZmd/?lang=...
What does it have to do with Brazilian academia?
Finland, a country invaded by both Nazis and Soviets and bombed to hell, made the Linux kernel and Nokia btw.