## Some thoughts on ranking systems

### Re: Some thoughts on ranking systems

Why isn't data from multiple periods used to calculate strength of opponents (general strength, not SDO)? Is it because the formula needs to be consistent, or easy to understand?

I see that the necklace is an award for the best performer of the period, but you wouldn't usually expect the strength of her opponents to change largely between periods, which is what happens with SDO / VP / whatever pumping.

### Re: Some thoughts on ranking systems

Thank you very much. So, SVDO passes all of the qualifying conditions I have laid out. Furthermore, I feel using the CVP necklace group selection criterion may have been better compared to the current selection process. In a way, I am pleased to know that 2011 situation did NOT result in necklace group match winner getting the necklace, as this suggests there is a good balance between SVDO and NGP. So, folks, do you prefer SVDO system better than SDO system? Here are pros and cons that I can think of. Please point out any mistakes or missing pro or cons.Shmion84 wrote:saving spaceShowThe maximum was marked in bold text.maglor wrote:OK, folks. I need some help. Here are the stats needed to advance or squash this.

For 2011, 2012, and 2013 Aquamarine, please tell me

(1) CVP

(2) SVDO = SDOVP = Sum of Vote Points of the defeated Opponents

(3) Necklace Score = SVDO + 50 * NGP ( = Necklace Group performance = ( votes received by the contestant in the 7 character necklace group match ) / ( total number of votes cast for the necklace group match) . Since using 100 as multiplier has chance to do too much, I thought 50 would be more appropriate.

Yes. SVDO for 7 win characters is pretty constant (between 45 and 50 points).maglor wrote: What we need to see is

(1) SVDO for 7 win characters be higher than those for 6 win characters in almost all the cases in these 3 aquamarine periods.

Yes. For example: only in 2011 the character with the highest SVDO got the highest Final Necklace Score.maglor wrote:(2) Final Necklace Score ranking be different from SVDO ranking in at least one place in at least two out of three years.

Yes. For example: In 2011 Akiyama Mio gained 3rd rank under old rules and 5th rank under proposed rules. In 2012 Shana gained 5th rank under old rules and 4th rank under proposed rules. In 2013 Kuroyukihime gained 3rd rank under old rules and 6th rank under proposed rules.maglor wrote:(3) Final Necklace Score ranking using SVDO be different from Necklace Score Ranking we used for the year in question in at least one place in at least two out of three years.

In 2011 Nakano Azusa (CVP 8.87895438) and Gokō Ruri (Kuroneko) (CVP 8.705217814) would have replaced Nakamura Yuri and Eucliwood Hellscythe.maglor wrote:It would be interesting to see if any of the 7 would have been different if necklace group selection was (1) number of wins (2) CVP .

In 2012 no changes (exact 7 characters with 7 wins).

In 2013 Yūki Asuna (CVP 9.154928839) would have replaced Kōsaka Kirino.

You might double check these calculations.

Pro

1. The relative difference in SVDO is less compared to SDO.

2. at least it isn't SDO

Con

1. There now is even more reason to attempt SVDO manipulation as you can further increase SVDO by voting for a strong opponent that your favorite beat, so that strong opponent will beat down on weaker contestants even harder

2. This greatly reduces chance for 6 win character to win the necklace over a 7 win character. If we want to increase that chance, we can raise the multiplier for NGP to 100, but then we risk SVDO becoming too insignificant, thus even a 5 win character now will have chance to win the necklace over 7 win character if a 5 win character gets into necklace match due to some extraordinary circumstances. The main reason for all this is because it is possible for 6 or 7 win characters to have very low or very high SDO value, but SVDO value's usual range is much narrower.

Possible Alternative

Instead of SVDO, we can use AVDO = average of cumulative vote points(CVP) of defeated opponents. This will increase the chance that 5 or 6 win character can win the necklace. The multiplier for NGP will have to be greatly reduced, likely be 2 to 4, instead of 50 or 100, since I expect difference of AVDO between contestants of adjacent AVDO rank to be near 0.1, thus 0.1/0.035 ~ 3. Now good AND bad thing about this is the increase of the chance that 5 or 6 win character will win the necklace. Do not that 7 win character will still have higher AVDO compared to 6 win characters most of the time, because that 7th win would have came from someone very strong.

### Re: Some thoughts on ranking systems

The necklace system is deliberately designed to be specific only within a period, so someone like Sakurazaki Setsuna(2008 Topaz), Katsura Hinagiku (2009 Ruby), or Eucliwood (2011 Diamond ) won't need to worry about records from prior periods dragging them or their opponents down.Midnight-Jasper wrote:Why isn't data from multiple periods used to calculate strength of opponents (general strength, not SDO)? Is it because the formula needs to be consistent, or easy to understand?

I see that the necklace is an award for the best performer of the period, but you wouldn't usually expect the strength of her opponents to change largely between periods, which is what happens with SDO / VP / whatever pumping.

### Re: Some thoughts on ranking systems

I think this is a good thing, because I don't believe a character deserve to win the title of that period if she can't even manage a 7-0.maglor wrote: 2. This greatly reduces chance for 6 win character to win the necklace over a 7 win character.

I took another look at Emerald 2013 where the highest necklace match VF did not belong to the character with the highest SDO.

Eligible CharactersShow

Misaka Mikoto 9.453

Shiina Mashiro 9.321

Hasegawa Kobato 8.862

Kuroyukihime 8.757

Aragaki Ayase 8.701

Takanashi Rikka 8.624

Eucliwood Hellscythe 8.570

Eucliwood Hellscythe would have replaced Shana for having a higher CVP.

The necklace score based on different factor onto the Necklace V…Show

### Re: Some thoughts on ranking systems

10ZHAbin wrote:I think this is a good thing, because I don't believe a character deserve to win the title of that period if she can't even manage a 7-0.maglor wrote: 2. This greatly reduces chance for 6 win character to win the necklace over a 7 win character.

I took another look at Emerald 2013 where the highest necklace match VF did not belong to the character with the highest SDO.Eligible CharactersShow~~Tachibana Kanade 10.437~~

~~Gokou Ruri 9.682~~

Misaka Mikoto 9.453

Shiina Mashiro 9.321

Hasegawa Kobato 8.862

Kuroyukihime 8.757

~~Yuuki Asuna 8.734~~

Aragaki Ayase 8.701

Takanashi Rikka 8.624

Eucliwood Hellscythe 8.570

~~Shana 8.482~~

~~Nakano Azusa 8.454~~

~~Aisaka Taiga 8.401~~

~~Akiyama Mio 8.020~~

Eucliwood Hellscythe would have replaced Shana for having a higher CVP.The necklace score based on different factor onto the Necklace V…Show

Thank you. These are very informative. One thing I keep noticing is how low SVDO is for Misaka Mikoto or Kuroyukihime. While you can significantly lower your rival's SDO with strategic voting if you get lucky, you can't really lower SVDO since vote against a character will have very small impact on that character's CVP. This means that the reason for Mikoto and Kuroyukihime having such a low SDO may have more to do with their opponents losing their fanbase votes instead of a faction actively trying to vote against them.

### Re: Some thoughts on ranking systems

On AVDO:

(without winless characters)

You can clearly see: SVDO (7 wins) > SVDO (6 wins) > SVDO (5 wins) > ...

AVDO (7 wins) is often - not always - higher than AVDO (6 wins). So characters with 6 wins have chances to win the necklace (what i prefer). The difference of AVDO between contestants of adjacent AVDO is just below 0.1.

This is not the first discussion on necklace rules, right?

PS: I missed maglor's 4444th post.

graphic for Aquamarine 2011, 2012, 2013Show

### Re: Some thoughts on ranking systems

Thanks again. There has been many discussion on the necklace rules, but I don't think any of the alternatives proposed has got this amount of analysis by multiple people. Feel free to check out this page -> http://internationalsaimoe.com/forum/vi ... f=4&t=3906" onclick="window.open(this.href);return false; and tell me if you see anything to your liking.Shmion84 wrote:On AVDO:

(without winless characters)graphic for Aquamarine 2011, 2012, 2013Show

You can clearly see: SVDO (7 wins) > SVDO (6 wins) > SVDO (5 wins) > ...

AVDO (7 wins) is often - not always - higher than AVDO (6 wins). So characters with 6 wins have chances to win the necklace (what i prefer). The difference of AVDO between contestants of adjacent AVDO is just below 0.1.

This is not the first discussion on necklace rules, right?

PS: I missed maglor's 4444th post.

How to balance the need to properly award 7 win character while giving 6 win character a chance is a tough question. We may need to reverse engineer this. The question I like to pose for all of you is, "What should be the optimal VF% difference should 6 win character have over a 7 win character in the necklace group match in order for that 6 win character to deserve the necklace? " We need to try to reach an agreement in this value, as it will help us narrow down ranges of models and variables to use.

Assuming we have a good idea of that VF% difference to aim for, I do see some ways to lessen the gap between 6 and 7 win characters. The 2 easiest way would be to have NP = SVDO + alpha*AVDO + NGP or NP = SVDO + alpha*SVAO ( = sum of CVP of all opponents ) + NGP . Since it is possible for 6 win characters to have higher AVDO or SVAO, we can find the right alpha value that will adjust the situation to require 6 win characters to win by desired VF% or more in the necklace group match.

By the way, a tie should be handled as a half win in calculating the win percentage, and half of that opponent's CVP will be used in calculating SVDO or AVDO in this scoring system. For SVAO, whether or not you win, tie, or lose don't matter, thus the full CVP value should be used.

### Re: Some thoughts on ranking systems

I did some rough calculations with goal of having 10% gap a strong 6 win contestant usually have compared to weak 7 win contestant. The formula that seems to work for this seems to be

NP = SVDO + SVAO + NGP(100) meaning 100 is the multiplier for necklace group match result. I'm afraid this formula means nothing if we can't agree to the hurdle height of 10%.

### Re: Some thoughts on ranking systems

So I read some of the 'Necklace and SDO Discussion' thread. I get the feeling it is all about an arms race between ISML staff and several factions. Even though manipulations should be prevented, the system should never be too complicated or develop negative side effects (voting for your favourite character in a match worsened her rating).

In 2012 I collected some examples for the SDO: http://internationalsaimoe.com/forum/vi ... 93#p172165" onclick="window.open(this.href);return false;

The AVDO has a similar problem: Your favourite character's AVDO will decline, if she wins against a weak character with a low CVP. She has to lose to keep her old (better!) AVDO.

Back on topic:

I think a "good" ranking system should

- not be too complicated

- not punish votes for a character (whose rank drops as a result)

- not too vulnerable to coincidences (differences in SVDO of characters with same number of wins were small)

Any new system should be tested intensively on earlier Periods.

Note: We can comply with the rule of heraldry if Aquamarine is second period and Topaz is fourth period.

In 2012 I collected some examples for the SDO: http://internationalsaimoe.com/forum/vi ... 93#p172165" onclick="window.open(this.href);return false;

The AVDO has a similar problem: Your favourite character's AVDO will decline, if she wins against a weak character with a low CVP. She has to lose to keep her old (better!) AVDO.

Back on topic:

I would say about 5 %.maglor wrote:"What should be the optimal VF% difference should 6 win character have over a 7 win character in the necklace group match in order for that 6 win character to deserve the necklace? "

I think a "good" ranking system should

- not be too complicated

- not punish votes for a character (whose rank drops as a result)

- not too vulnerable to coincidences (differences in SVDO of characters with same number of wins were small)

Any new system should be tested intensively on earlier Periods.

Note: We can comply with the rule of heraldry if Aquamarine is second period and Topaz is fourth period.

### Re: Some thoughts on ranking systems

Thank you for your input. For 5% margin, since the difference in SVDO between 7 win characters and 6 win characters do occasionally be about 5, using multiplier of 100 for NGP seems appropriate.Shmion84 wrote:So I read some of the 'Necklace and SDO Discussion' thread. I get the feeling it is all about an arms race between ISML staff and several factions. Even though manipulations should be prevented, the system should never be too complicated or develop negative side effects (voting for your favourite character in a match worsened her rating).

In 2012 I collected some examples for the SDO: http://internationalsaimoe.com/forum/vi ... 93#p172165" onclick="window.open(this.href);return false;

The AVDO has a similar problem: Your favourite character's AVDO will decline, if she wins against a weak character with a low CVP. She has to lose to keep her old (better!) AVDO.

Back on topic:I would say about 5 %.maglor wrote:"What should be the optimal VF% difference should 6 win character have over a 7 win character in the necklace group match in order for that 6 win character to deserve the necklace? "

I think a "good" ranking system should

- not be too complicated

- not punish votes for a character (whose rank drops as a result)

- not too vulnerable to coincidences (differences in SVDO of characters with same number of wins were small)

Any new system should be tested intensively on earlier Periods.

Note: We can comply with the rule of heraldry if Aquamarine is second period and Topaz is fourth period.

For simplicity sake, I am now leaning towards NP = SVDO + 100*NGP

Compared to current system that rewards the loser point as (loser's VF)/(winner's VF), the vote for the winner will have less direct impact as this will be replaced by (loser's VF)/(Average VF of all contestants in that match day).

As for the difference in SVDO value, its practical value difference is much greater than suggested by its relative value difference. As I mentioned before, average difference in adjacent finish ranks for NGP is near 0.03 to 0.04. Any difference value needs to be normalized by dividing it by 0.035. In this sense, difference in SVDO value can be significant enough. This is why we also need to think about the multiplier.

I think we now needs some measure for the quality of the scoring/ranking systems, thus I introduce you to SARD = Sums of absolute rank difference.

For current necklace point system, SARD = ( sum of absolute values of (NP rank - SDO rank) + sum of absolute values of (NP rank - NGP rank ) )/2

We want a system that maximizes SARD.

I am considering following 3 challengers to current NP system = SDO/3 + 100*NGP = NPC

(1) NP1 = SVDO + 50 * NGP

(2) NP2 = SVDO + 100*NGP

(3) NP3 = SVDO + SVAO + 50* NGP

SARD for NP1 and NP2 will be ( sum of absolute values of (NP rank - SVDO rank) + sum of absolute values of (NP rank - NGP rank ) )/2

SARD for NP3 will be ( sum of absolute values of (NP rank - SVDO rank) + sum of absolute values of (NP rank - SVAO rank) + sum of absolute values of (NP rank - NGP rank ) )/3

Nice thing about SARD measurement is that central limit theorem likely will punish SARD score for more complicated model.

Since we are merely comparing 3 fixed model without having any coefficient being fitted to the situation, I don't think we need to go into more complicated validation technique of cross-validation, calculating of uncertainty through forming linear weight matrix , nor using bootstraps. Still I think we do need the average SARD value for ALL four models, using our entire sample space, which is all the necklace periods in 2011, 2012, and 2013. Please note that for 2011, we will be using the same NPC as 2012 and 2013, not the NP systme used in 2011. If we can find one of the models having significantly higher SARD value compared to other models ( and in order to calculate significance , we need to calculate average SARD value AND STDEV of SARD value for each model . The standard procedure would be to divide STDEV value by sqrt(number of samples = 7+5+5 = 17) which is called STERR and then construct 95% confidence interval by getting [average - 2*STERR, average + 2*STERR] range for each 4 model . Let's not be too concerned with multiple comparison issues yet, since we are using only 4 models. If any of the model has SARD confidence interval (CI) that does not overlap with any of other SARD CI , we can call that model to have significantly different SARD value with very high confidence ) , then we really need to consider that model to be superior to other models. I know this will take good deal of computer work, and we will fill great amount of visual space with tables and tables of numbers, but this is a task worth taking if we are to have any credible case for changing the necklace point system.

The above method is very important. We must consider average SARD value in any discussion of changes to NP system.

P.S. : Care to tell us more about the rule of Heraldry ?

P.S.2 : I had been thinking more about the SARD and believes MSARD = minimum of sums of absolute rank difference might be better quantity to maximize. The following is equation for MSARD for the 4 models at hand.

NPC : min( ( sum of absolute values of (NP rank - SDO rank) , sum of absolute values of (NP rank - NGP rank ) )

NP1 & NP2 : min( sum of absolute values of (NP rank - SVDO rank) , sum of absolute values of (NP rank - NGP rank ) )

NP3 : min (sum of absolute values of (NP rank - SVDO rank) , sum of absolute values of (NP rank - SVAO rank) , sum of absolute values of (NP rank - NGP rank ) )

One problem with this is that using minimum may too severely penalize a system for having more component. Still, this optimization strategy does better job of making sure all the components in the NP system contributes in meaningful way. If time allows it, we should consider whether using MSARD will lead to selection of different system compared to using SARD.

P.S. 3 : I do know that since we have a parameter to optimize for, we could make this true optimization problem by using models like

NPV = SVDO + C1*NGP

and then let the optimizer find us the best C1. We can even do a cross validation scheme to select whether model using SDO, SVDO, or SVDO+SADO is the best, by splitting the 17 necklace periods into 4 sets, and then letting the optimizer to find us the best coefficient value for each case. The reason I avoided the temptation to do this is because that will result in coefficient values that will look like 73.5246 , which will be more difficult number to grasp compared to simple 300 ( for current system, the NPC ), 100, or 50 being discussed here.

### Re: Some thoughts on ranking systems

But if my memory is correct, Eucliwood was 5-2 with SDO 57 in 2011 Diamond, behindmaglor wrote:The necklace system is deliberately designed to be specific only within a period, so someone like Sakurazaki Setsuna(2008 Topaz), Katsura Hinagiku (2009 Ruby), or Eucliwood (2011 Diamond ) won't need to worry about records from prior periods dragging them or their opponents down.

*Suzumiya Haruhi*in SDO and

*Nakano Azusa*in SoS.

### Re: Some thoughts on ranking systems

(1) At first I calculated SARD and MSARD for the current, fractional SDO-system.

Average: 7.029

Standard deviation: 1.851

Standard error: 0.449

95.5 % - confidence intervall: 0.953 - 2.749

Average: 4.647

Standard deviation: 2.588

Standard error: 0.628

95.5 % - confidence intervall: 1.332 - 3.844

ABS (1) = sum of absolute values of (NP rank - SDO rank)

ABS (2) = sum of absolute values of (NP rank - NGP rank)

(2) I also calculated NP1, NP2 and NP3 for Aquamarine 2011. I have doubts about NP3:

NP3 = SVDO + SVAO + 50* NGP

For 7 win characters SVDO (= sum of CVP of defeated opponents) and SVAO (= sum of CVP of all opponents) are identical. In this case NGP has only half as much influence as in NP1!

However I uploaded the spreadsheet to google docs. I have not taken draws into account. You can check for errors (and look for easier functions):

https://docs.google.com/spreadsheets/d/ ... sp=sharing" onclick="window.open(this.href);return false;

SARD and MSARD values in BU1:BW4.

(3) I am no expert of heraldry. Last year I searched about inheritance of coat of arms and found this: https://en.wikipedia.org/wiki/Tincture_(heraldry" onclick="window.open(this.href);return false;). 6 of 7 tinctures were connected to Necklace-Gemstones:

Gold/Yellow - Topaz

Silver/White - Pearl

Blue - Sapphire

Red - Ruby

Purple - Amethyst

Black - Emerald

Green - Diamond

I thought this would have been intentional and Pearl was replaced by Aquamarine. Maybe it is only a coincidence.

However wikipedia mentions the

This rule also applies to many flags. Look at: https://commons.wikimedia.org/wiki/Sove ... tate_flags" onclick="window.open(this.href);return false; (many tricolore-flags like Austria, Belgium, France, Ireland, Italy, Netherlands,or the Nordic Cross Flags of Denmark, Finland, Iceland, Norway and Sweden)

If we count Aquamarine as Silver/White (Pearl) the necklace gemstones consist of 2 metals (Aquamarine and Topaz) and 3 colours (Amethyst, Ruby and Emerald). The arrangement of necklace periods can comply with the rule using any colour - metal - colour - metal - colour combination.

**SARD**Average: 7.029

Standard deviation: 1.851

Standard error: 0.449

95.5 % - confidence intervall: 0.953 - 2.749

**MSARD**Average: 4.647

Standard deviation: 2.588

Standard error: 0.628

95.5 % - confidence intervall: 1.332 - 3.844

full graphicShow

ABS (2) = sum of absolute values of (NP rank - NGP rank)

(2) I also calculated NP1, NP2 and NP3 for Aquamarine 2011. I have doubts about NP3:

NP3 = SVDO + SVAO + 50* NGP

For 7 win characters SVDO (= sum of CVP of defeated opponents) and SVAO (= sum of CVP of all opponents) are identical. In this case NGP has only half as much influence as in NP1!

However I uploaded the spreadsheet to google docs. I have not taken draws into account. You can check for errors (and look for easier functions):

https://docs.google.com/spreadsheets/d/ ... sp=sharing" onclick="window.open(this.href);return false;

SARD and MSARD values in BU1:BW4.

(3) I am no expert of heraldry. Last year I searched about inheritance of coat of arms and found this: https://en.wikipedia.org/wiki/Tincture_(heraldry" onclick="window.open(this.href);return false;). 6 of 7 tinctures were connected to Necklace-Gemstones:

**Metals**Gold/Yellow - Topaz

Silver/White - Pearl

**Colours**Blue - Sapphire

Red - Ruby

Purple - Amethyst

Black - Emerald

Green - Diamond

I thought this would have been intentional and Pearl was replaced by Aquamarine. Maybe it is only a coincidence.

However wikipedia mentions the

**rule of tincture**:*metal must never be placed upon metal, nor colour upon colour*This rule also applies to many flags. Look at: https://commons.wikimedia.org/wiki/Sove ... tate_flags" onclick="window.open(this.href);return false; (many tricolore-flags like Austria, Belgium, France, Ireland, Italy, Netherlands,or the Nordic Cross Flags of Denmark, Finland, Iceland, Norway and Sweden)

If we count Aquamarine as Silver/White (Pearl) the necklace gemstones consist of 2 metals (Aquamarine and Topaz) and 3 colours (Amethyst, Ruby and Emerald). The arrangement of necklace periods can comply with the rule using any colour - metal - colour - metal - colour combination.

### Re: Some thoughts on ranking systems

Shmion84 wrote:(1) At first I calculated SARD and MSARD for the current, fractional SDO-system.

SARD

Average: 7.029

Standard deviation: 1.851

Standard error: 0.449

95.5 % - confidence intervall: 0.953 - 2.749

MSARD

Average: 4.647

Standard deviation: 2.588

Standard error: 0.628

95.5 % - confidence intervall: 1.332 - 3.844

ABS (1) = sum of absolute values of (NP rank - SDO rank)full graphicShow

ABS (2) = sum of absolute values of (NP rank - NGP rank)

(2) I also calculated NP1, NP2 and NP3 for Aquamarine 2011. I have doubts about NP3:

NP3 = SVDO + SVAO + 50* NGP

For 7 win characters SVDO (= sum of CVP of defeated opponents) and SVAO (= sum of CVP of all opponents) are identical. In this case NGP has only half as much influence as in NP1!

However I uploaded the spreadsheet to google docs. I have not taken draws into account. You can check for errors (and look for easier functions):

https://docs.google.com/spreadsheets/d/ ... sp=sharing" onclick="window.open(this.href);return false;

SARD and MSARD values in BU1:BW4.

(3) I am no expert of heraldry. Last year I searched about inheritance of coat of arms and found this: https://en.wikipedia.org/wiki/Tincture_(heraldry" onclick="window.open(this.href);return false;). 6 of 7 tinctures were connected to Necklace-Gemstones:

Metals

Gold/Yellow - Topaz

Silver/White - Pearl

Colours

Blue - Sapphire

Red - Ruby

Purple - Amethyst

Black - Emerald

Green - Diamond

I thought this would have been intentional and Pearl was replaced by Aquamarine. Maybe it is only a coincidence.

However wikipedia mentions therule of tincture:metal must never be placed upon metal, nor colour upon colour

This rule also applies to many flags. Look at: https://commons.wikimedia.org/wiki/Sove ... tate_flags" onclick="window.open(this.href);return false; (many tricolore-flags like Austria, Belgium, France, Ireland, Italy, Netherlands,or the Nordic Cross Flags of Denmark, Finland, Iceland, Norway and Sweden)

If we count Aquamarine as Silver/White (Pearl) the necklace gemstones consist of 2 metals (Aquamarine and Topaz) and 3 colours (Amethyst, Ruby and Emerald). The arrangement of necklace periods can comply with the rule using any colour - metal - colour - metal - colour combination.

Thank you. You had a question about NP3, specifically why SVAO was added to the mix. The idea is that 6 win characters will likely have less SVDO but have slightly greater SVAO compared to 7 win characters. By adding SVAO, I am giving 6 win characters slightly better chance to overcome SVDO deficit compared to NP1. It is also test to see whether our NP model are too simple or not. If we see large increase in SARD value with SVAO in the mix, we may need need to more seriously consider adding in SVAO or some other 3rd factor into the NP formula. For 2011 Aquamarine, SARD value for NP3 is large because of Kanade. I still expect NP3's SARD value to be comparable to or below other models when we bring in other periods and other years.

One thing of note. I think you have a typo in your spread sheet as CI should be Average +- 2* STERR, but you did STDEV +- 2* STERR .

By the way, nice catch about this being 95.5% CI, since more formal 95% CI would have used 1.96 instead of 2. I didn't bother listing 1.96, since I didn't think we need that much of formality, and since we have multiple comparison situation, we could have considered alpha as low as 5%/(4 Choose 2 = 6 ) = 0.83% , which may be too conservative for our need.

### Re: Some thoughts on ranking systems

I think it is because of Eucliwood Hellscythe's and Nakamura Yuri's combination of high SVAO (both ranked #1) and low NP3 (ranked 6# resp. 7#).maglor wrote:For 2011 Aquamarine, SARD value for NP3 is large because of Kanade. I still expect NP3's SARD value to be comparable to or below other models when we bring in other periods and other years.

You are right. Thanks for your attention. The correct values:maglor wrote:One thing of note. I think you have a typo in your spread sheet as CI should be Average +- 2* STERR, but you did STDEV +- 2* STERR .

SARD-CI: 6.132 - 7.927

MSARD-CI: 3.392 - 5.903

### Re: Some thoughts on ranking systems

Yeah, You are right about Eucliwood and Yuri. In case of tie in the rank, use the average. In this case, it would have been better to give Eucliwood and Yuri rank of 1.5 for SVAO rank. The reason is that using the mean will likely reduce the overall variance.Shmion84 wrote:I think it is because of Eucliwood Hellscythe's and Nakamura Yuri's combination of high SVAO (both ranked #1) and low NP3 (ranked 6# resp. 7#).maglor wrote:For 2011 Aquamarine, SARD value for NP3 is large because of Kanade. I still expect NP3's SARD value to be comparable to or below other models when we bring in other periods and other years.

You are right. Thanks for your attention. The correct values:maglor wrote:One thing of note. I think you have a typo in your spread sheet as CI should be Average +- 2* STERR, but you did STDEV +- 2* STERR .

SARD-CI: 6.132 - 7.927

MSARD-CI: 3.392 - 5.903

### Re: Some thoughts on ranking systems

Yes. That is a better idea. So I replaced the function RANK by RANK.AVG.maglor wrote:In case of tie in the rank, use the average. In this case, it would have been better to give Eucliwood and Yuri rank of 1.5 for SVAO rank. The reason is that using the mean will likely reduce the overall variance.

For SDO-Calculation:

SARD-CI changed again: 6.065 - 7.817

MSARD-CI remained unchanged: 3.392 - 5.903

The SDO-"shared rank phenomenon" appeared for the last time in 2011 (thanks to the seeding I guess).

### Re: Some thoughts on ranking systems

Thanks, now if we have SARD and MSARD values for NP1,2,and 3, we can wrap up this phase of discussion.Shmion84 wrote:Yes. That is a better idea. So I replaced the function RANK by RANK.AVG.maglor wrote:In case of tie in the rank, use the average. In this case, it would have been better to give Eucliwood and Yuri rank of 1.5 for SVAO rank. The reason is that using the mean will likely reduce the overall variance.

For SDO-Calculation:

SARD-CI changed again: 6.065 - 7.817

MSARD-CI remained unchanged: 3.392 - 5.903

The SDO-"shared rank phenomenon" appeared for the last time in 2011 (thanks to the seeding I guess).

### Re: Some thoughts on ranking systems

This is completely unrelated to all the stats above, but will it possible to have two necklace per period each one for each division (Nova and Stella)?

### Re: Some thoughts on ranking systems

There are two hurdles for this.10ZHAbin wrote:This is completely unrelated to all the stats above, but will it possible to have two necklace per period each one for each division (Nova and Stella)?

(1) This would mean one more poster for period. How much are you willing to spend to bribe Hikari-chan for this?

(2) This also means there won't be any top tier match between Nova and Stella before Postseason Phase II . I know some people actually want more opportunities for Nova and Stella to have a match against each other. How much are you willing to spend to bribe these people?

### Re: Some thoughts on ranking systems

Excel has calculated the following results:

NP1 = SVDO + 50 * NGP

CI (= Confidence intervall) using SARD (= Sums of absolute rank difference): 5.227 - 7.008

CI using MSARD (= minimum of sums of absolute rank difference): 2.978 - 5.257

NP2 = SVDO + 100 * NGP

CI-SARD: 5.182 - 7.053

CI-MSARD: 3.272 - 5.317

NP3 = SVDO + SVAO + 50 * NGP

CI-SARD: 7.202 - 9.464

CI-MSARD: 3.188 - 6.106

for comparison:

NPC = SDO/3 + 100 * NGP

CI-SARD: 6.065 - 7.817

CI-MSARD: 3.392 - 5.902

The best result for SVDO + C1*NGP is C1 = 90:

CI-SARD: 5.221 - 7.132

CI-MSARD: 3.327 - 5.261

full imageShow

Looking at the values (both SARD and MSARD) I would say NP1, NP2 and NP3 are not superior to NPC.maglor wrote:The standard procedure would be to divide STDEV value by sqrt(number of samples = 7+5+5 = 17) which is called STERR and then construct 95% confidence interval by getting [average - 2*STERR, average + 2*STERR] range for each 4 model . Let's not be too concerned with multiple comparison issues yet, since we are using only 4 models. If any of the model has SARD confidence interval (CI) that does not overlap with any of other SARD CI , we can call that model to have significantly different SARD value with very high confidence ) , then we really need to consider that model to be superior to other models.

The best result for SVDO + C1*NGP is C1 = 90:

CI-SARD: 5.221 - 7.132

CI-MSARD: 3.327 - 5.261