[request] XML output of scores
#1
Hi all,

Recently BluePrint told me it would be good to have a way to save players' scores from the server without parsing logs. I wasn't convinced first then I understood it would be great because :
1) we could have access to a lot of stats, like accuracy per weapons, kills per weapons, and more...
2) it would allow people to make ladders more easily for example

So I thought about making an XML output, written by the server, with about 1 entry by player and by "session" (session could be the the period between player connection/disconnection or 2 changes of name, with a min. limit to avoid spam in the output file).

(XML would be a good file structure imho)

I made & tested this system (thanks to Gibstick, aerkefiende, daylixx and brett, for the help testing it). It seems to be working fine.

I told RK but he didn't answer. That's why I ask devs : would you like to/could you implement it ? I can give the source of my work if you need it to see what I'm talking about more precisely.

Thanks !
Thanks given by:
#2
It's nice that you've decided to store it in a more accessible format than the current one.

However, knowing that a single server produces so much data in binary [compressed] demo format, I doubt XML would be of any use. The few big disadvantages with XML:
1) It still needs to be parsed (though you wouldn't have to be the one writing the parser). That means that you'd either have to use more and more libraries, and that'd require far more processing power, albeit being easier to use.
2) It will require a lot of storage space. A single (even compressed) XML file with such data will become very large in size. There's no point redundantly storing things like "<frags>123</frags>" when it could be stored as 7B in hexadecimal (look at the length).
3) It can be easily abused: how will you consider, for example, cases where a user joins the server with 14 clients from the same IP with the same name, and then starts changing those names with every reconnection for a 15-minute period?

You really need to revise your model...

And to counter your "advantageous" points:
1) All of that is provided by the demos already, and servers do output those.
2) I don't think unskilled coders should be taking on very large things like ladders. You'll start to get a lot of "trashy" projects coming out of something like this, as opposed to a few good outcomes like what you've worked on as an example.
Thanks given by:
#3
(05 Jan 11, 03:17PM)Drakas Wrote: It's nice that you've decided to store it in a more accessible format than the current one.

However, knowing that a single server produces so much data in binary [compressed] demo format, I doubt XML would be of any use. The few big disadvantages with XML:
1) It still needs to be parsed (though you wouldn't have to be the one writing the parser). That means that you'd either have to use more and more libraries, and that'd require far more processing power, albeit being easier to use.
2) It will require a lot of storage space. A single (even compressed) XML file with such data will become very large in size. There's no point redundantly storing things like "<frags>123</frags>" when it could be stored as 7B in hexadecimal (look at the length).
3) It can be easily abused: how will you consider, for example, cases where a user joins the server with 14 clients from the same IP with the same name, and then starts changing those names with every reconnection for a 15-minute period?

You really need to revise your model...

And to counter your "advantageous" points:
1) All of that is provided by the demos already, and servers do output those.
2) I don't think unskilled coders should be taking on very large things like ladders. You'll start to get a lot of "trashy" projects coming out of something like this, as opposed to a few good outcomes like what you've worked on as an example.

Yes I know what you said about output size. But server can performs many tests to avoid abuse (for example, I added a min. of 1 minute by "session".). We can and have to filter what we save.
Also, if XML still needs to be parsed (it's true), many libraries exist for that, which are easily implementable in most of cases, and they're quite powerful ?!

About demos, their format is harder to read... Same problems as we got with XML in worse :s

Thanks for your answer, I hope I'm not going in a totally wrong direction :D
Thanks given by:
#4
(05 Jan 11, 03:17PM)Drakas Wrote: And to counter your "advantageous" points:
1) All of that is provided by the demos already, and servers do output those.
O.o You heathen! Jk.
This is something a lot of server owners might like to use for their own sites. And I think for individual servers, this may be the way to go. It may not make it into the official code but feel free to post what you've got.
Thanks given by:
#5
drakas, lucas didnt explain my idea fully.
the server output would be changed in a more friendly way meaning that parsing is not needed! the log output is ready to go into a db/ladder - this could allow real time display of a game.

i tested this with a similar thing to the CSL / web version i made. and its a lot more resource friendly
Thanks given by:
#6
Fiz had brought this idea up before as well, saying he wanted to put chat (for example) into a SQL database so that it could be searched.

I hope something like this gets made. It may be as simple (if you call it that) as making a cron job to do lots of stream editing once in a while, but I don't know how clean that would end up. From my attempts at this kind of stuff, I would say not very haha.
Thanks given by:
#7
I have A LOT going on with working on trying to efficiently gather the log data and put it in a more usable form. Right now I have a php script opening a pipe and starting the server itself so its essentially running the server inside itself and it logs every line the server spits out directly to mysql. I plan to expand that to actually parse those lines and spit it into different database tables for chat and other stats. This seems to be working ok even with a full server osok 18 players max.

I have been reading and pondering what to say in this thread and I have a few suggestions. XML while being a very good format for readability and has a ton of tool sets to parse it is a very innificient storage format and has a TON of overhead and bloat. Sure it can be minimized by keeping the tags simple but its still very bloaty. A more efficient format might be json and json currently has a lot of tool sets too in many languages. That would be my first choice. A second choice would be llsd which is one of the more efficient xml format might be an ok choice too. Again xml has bloat in pretty much any format.

Outside of creating a new log format or an additional output the demos have been a discussion in irc since this thread was started and I think maybe working on a way to parse out the demos might be a good solution also. I think the only reason it isn't done yet is the lack of documentation or examples on how that could be done and the fact that its in binary format.

What we have to keep in mind is that any "formatted" storage is going to be more bloaty then a plain output of the logs. So that should be a given, we just have to figure out how much bloat and to what gain...

I also wanted to comment on the trashy projects thing. Sometimes making things simple can fill an area with a lot of trash but you also create a competition to be better among the more capable ones. The trash will be weeded out and things like ladders will get more efficient and ran better. The current choice of ladders is less like a choice and more like all you have and could be improved on. When theres alternatives attitudes change and the push to be the best grows. its just like when a gas station opens across the street from the over priced one that used to be by itself. Prices go down and service gets better. So I think making it easier and more efficient to do the things the community wants to do will only be benificial even if it allows for more trash.
Thanks given by:
#8
Very nice points, Fiz :D
Thanks given by:
#9
(07 Jan 11, 11:06PM)Fiz Wrote: I think the only reason it isn't done yet is the lack of documentation or examples on how that could be done and the fact that its in binary format.

A demo file is just a gzipped stream of packets, with a header at the start. readdemo() reads the data packet by packet and forwards it through enet to parsemessages() in clients2c.cpp, where the information is read from the packets using getint() and friends and is then processed.

Now you could just get in at readdemo(), but instead of passing the stuff through enet, you'd just take it apart right there.

Not easy , but possible - read the code :P
Thanks given by:
#10
(08 Jan 11, 12:47AM)tempest Wrote:
(07 Jan 11, 11:06PM)Fiz Wrote: I think the only reason it isn't done yet is the lack of documentation or examples on how that could be done and the fact that its in binary format.

A demo file is just a gzipped stream of packets, with a header at the start. readdemo() reads the data packet by packet and forwards it through enet to parsemessages() in clients2c.cpp, where the information is read from the packets using getint() and friends and is then processed.

Now you could just get in at readdemo(), but instead of passing the stuff through enet, you'd just take it apart right there.

Not easy , but possible - read the code :P

Ok, well I will just state this plainly. C/C++ is my kryptonite :P I can read the code most of the time but its pretty hard to comprehend on a level that allows me to convert it to the languages I use. Since it was put in my head I did start down the road of looking for the code in the source to try and convert to PHP(my fav language) to see if I can get some usefull data from the demos. I believe they should be able to be parsed relatively fast. Now that I have the file (thanks to you :)) It will be easier to figure out.... maybe...

I did have some concerns about it specially after comparing the direct output of the console with the log files. A few things I think might be useful are missing in the logfiles but can be gotten if the console is grabbed. I noticed in the log files team switching is not logged so it is hard to see when someone switches from 1 team to another. Useful for realtime stats and maybe some other ladder based "awards". The other is who votes on what also possibly useful for ladder "awards' but more importantly to see if there is any voting abuse and who is all involved. Neither are available in the logs even with the highest logging setting. I am curious to see if it is available in the demos.

Edit: with highest logging who voted for what is available sorry.

Also I didn't mean "lack of documentation" as in a failure on the devs part but if any one has done it before its not documented. I do seem to remember someone having a system to parse demos and look for cheaters but I couldn't find any code on the net that showed how that might be done.
Thanks given by:
#11
(08 Jan 11, 12:59AM)Fiz Wrote: I do seem to remember someone having a system to parse demos and look for cheaters
I think that was Brahma.
Thanks given by:
#12
the purpose of log is it MUST be readable by human. it means simple, time and location. No more and no messing information as XML tags.

If it would be possible log in xml, it should be in special level of logging. And you need to publish xsd files somewhere.

I actually use linux bash for parsing all informations.
Thanks given by:
#13
in this case, we should make a system to read demos... I can try to code it in PHP if I have time
Thanks given by:
#14
(08 Jan 11, 02:11PM)Alien Wrote: the purpose of log is it MUST be readable by human. it means simple, time and location. No more and no messing information as XML tags.

If it would be possible log in xml, it should be in special level of logging. And you need to publish xsd files somewhere.

I actually use linux bash for parsing all informations.

Human readability is the final and only goal of the concepts laid out in this thread.

Alien please refrain from posting if you can't follow the rules. You may be particularly interested in number ninteen (Do not post ”empty” or useless responses).
Thanks given by:
#15
argh!

if log output from the server was changed, it would be a clean readable format that needs no parsing,

the XML is merely for the ladder/db side of it as i can call the stats using this, like i do with the TyD one. and relay it to AC

YAML is also another way to go.

i thought of the demo parsing for info but that is time consuming, straight from the server is ideal.

stdout? <this would enable more ways to relay the info
Thanks given by:
#16
i like the concept of iccup.org again. Admins of servers just care only about their stuff.
For ladders are used demos(replays) which are analysed by programs third sides.
http://bwchart.teamliquid.net/ So everyone can check how it is his playing and check if his enemies didn't cheat etc.

But if we have already current system and build infrastructure with logs, it would be stupid to don't use it.

@eftertanke
what is your problem ? I have solved some logging issues in commercial applications (mainly in m$ in WCF, WS and exe too)and know each costumer doesn't have parser by which he could read it or if i should create for him next program for reading it and then tutorial how such program use, so i just make logs easy for reading.

Thanks given by:
#17
I started to work on demo reading and it doesn't look really simple. There should be smarter ways to do this, really. Now, I don't know how...

XML : no for the reasons given above

json : would it be really better ?

demos : no because parsing demo would be redundant with server process, I mean it's much easier to make server does this work himself

Fiz's solution : I would like to know more about it. Does it allow to get extended informations ? what happens if link between PHP server and AC server is broken ? datas are lost ?

YAML : I'm currently checking out this solution, seems good, what do you think about it ? (thanks BluePrint for the idea)

There are maybe some other solutions... like a lighter format to save scores (for example, save scores as I did with my XML system but in the same format as demos ?)

NB : it would be interesting to keep the "real time" aspect, if you see what I mean
Thanks given by:
#18
you can save in YAML format, its easily readable by many systems/methods

YAML > DATABASE > FRONTEND LADDER > XML OUTPUT :-)
Thanks given by:
#19
(08 Jan 11, 05:53PM)Blue_Pr!nt Wrote: you can save in YAML format, its easily readable by many systems/methods

YAML > DATABASE > FRONTEND LADDER > XML OUTPUT :-)
yes and I saw a c++
Thanks given by:
#20
now with relation to logs it is json vs yaml

if i would put before ladder problem i would did new instance WS calling and create proper WS for it (real time but problem with address)
or
make ladder based on demos which players upload themself (no real time)
Thanks given by:
#21
(08 Jan 11, 05:38PM)Luc@s Wrote: Fiz's solution : I would like to know more about it. Does it allow to get extended informations ? what happens if link between PHP server and AC server is broken ? datas are lost ?

Well there is no way for the link to be broken really. The server would be a sub/child process of PHP so its running inside PHP. So far its been working great and you get all the output that the server gives on the console including stdout and stderr as well as being able to use stdin for sending keystrokes to the server if some type of ability in the future is added to do anything inside the console.

Here is the class I am working on now maybe someone can improve it or whatever. Currently all it does is log the lines to a mysql database. I am still trying to figure out if the parsing of the logs should be done right then or by another script outside of this one for reasons of slowing it down.

PHP Class:
http://pastie.org/private/svcyjlkxtzvobqeaukds1g

Actual Execution:
http://pastie.org/private/xhwanrz1lehwntvzmiu42w

Disclaimer: PHP scripts are sold as is and no warranty is given, scripts may cause indigestion and/or bloating. :P Commenting on this is also very sparse so feel free to ask me about whats going on.

I think that to make this what is wanted the parsing of the output should probably be done in line in the same script so its all accomplished in the 1 script so its the only process that needs to be started/managed.

I think the focus of this thread has got blurred a little. So I ask was the focus to make logs easier to read or make them more parsible? I think my solution can accomplish both but it will be in a harder to access medium requiring an interface to view. If the focus is making things easier to manage in a ladder which is mostly in a web interface I don't see that as a problem but if the focus is to make the logs human readable locally I think that maybe the bulkier xml storage format would be better. Mainly because xml+xslt will make the xml file look like exactly what you want it to and things like real time search could be added with javascript. Of course the xml files would have to be open in a browser but since thats the default for most computers it wouldn't be a problem either.

Sorry that I am always so long winded but my mind goes crazy when it comes to figuring programming stuff out :P


Thanks given by:
#22
combination xml and xslt is good idea
Thanks given by:
#23
Ok Fiz first thanks a lot for sharing your stuff ! good work btw

But sadly it's not really what we're aiming at (or at least what I am aiming at :D).
Actually, focus is to make the server saves stats himself. Because 1) it's the best "side" to save stats, and 2) it can saves more things (like accuracy).
But your system is really interesting for match-following for example :D

I need to try YAML.
gonna do that.
Thanks given by:
#24
PHP class Wrote:also I am sure the $buffer needs to be sanitized more

mysql_real_escape_string(), always use the specific function for the database you're using. Other than that, why?
Thanks given by:
#25
(08 Jan 11, 07:11PM)Luc@s Wrote: Ok Fiz first thanks a lot for sharing your stuff ! good work btw

But sadly it's not really what we're aiming at (or at least what I am aiming at :D).
Actually, focus is to make the server saves stats himself. Because 1) it's the best "side" to save stats, and 2) it can saves more things (like accuracy).
But your system is really interesting for match-following for example :D

I need to try YAML.
gonna do that.

Maybe it would be a good idea for the devs to make it possible to write "modules" that allowed stuff like hooks on events so that outside code can be added to catch kills/teamkills/game changes and all that. Maybe a sort of api. Maybe out of the scope or this thread? or MAYBE the first example module can be a module that creates stats :D

(08 Jan 11, 07:39PM)tempest Wrote:
PHP class Wrote:also I am sure the $buffer needs to be sanitized more

mysql_real_escape_string(), always use the specific function for the database you're using. Other than that, why?

mysql_real_escape_string will work fine but the reason is I could see people who might "know" its ran inside of php and in its current state of insertion someone could fool addslashes and execute a mysql query inside of that query to damage the database or various other things with a well crafted message in the game.
Thanks given by:
#26
How about prepared statements?

And what's wrong with reading the logfile continuously? (hint: you could even use 'tail -n 0 -f logfile.txt')
Thanks given by:
#27
(08 Jan 11, 08:06PM)Drakas Wrote: How about prepared statements?

And what's wrong with reading the logfile continuously? (hint: you could even use 'tail -n 0 -f logfile.txt')

Well depending on how fast the log is scrolling tail might miss some things :P and reading the whole log file can get to be a resource hog when the log file grows in size. If your only doing it every so often its fine but my objective in creating the php pipe was to process the stats in real time so real time stats on the current game as well as long term stats could be provided.
Thanks given by:
#28
It never misses anything, and most certainly it is not a resource hog.
Thanks given by:
#29
No no not tail is a resource hog reading the whole log file in anything at a rapid pace would be. tail itself isn't because it only spits out that last few lines but say you have tail set to show the last 10 lines and in between processing those 10 lines 20 lines get put into the log then you miss the 10 extra lines that got written while you where processing the 10 you managed to grab. Using tail to spit the whole file out each time WOULD be a resource hog if the log file size was 10mb and larger which I see regularly just on a osok server I imagine on ctf servers and such they grow even bigger faster. Now I know processing 10 lines of a log should be pretty instant but 10 lines can be written to the log pretty fast too specially when a round changes or if some type of abuse is happening. I'm not saying this would happen often but with that method there is always the possibility for data to get lost.

It's also very hard to keep track of what you have already processed since there is no unique identifier provided. If you use syslog you have a pretty unique timestamp but I imagine if some logs come in fast enough they might have the same timestamp but then again its using another external source (syslog) to collect the logs.
Thanks given by:
#30
wait if you use logfiles you'll lose many datas (weapon stats, combos)
maybe I missed something ?
Thanks given by: