Android - Internal XML dumped

Supervised News & Rumors concerning the Google Android Platform.

Android - Internal XML dumped

Postby plusminus » Fri Mar 21, 2008 8:14 pm

Hello Community,

via helloandroid.com.
If you have been struggling to figure out how we are just supposed to know that there is a TextView with id "text1" in [font=Lucida Console]android.R.simple_list_item[/font], or want to see some of the layouts, animations, and drawables that Google is using for all of the applications that are included with the emulator there is now a tool to decode all those binary XML files to regular XML. Josh Guilfoyle has released a perl script that will take in a binary XML file from an apk or jar file and spit out the resulting text XML.

From his blog:
My primary motivation for doing this was to simply observe some of the common practices and get a sense for what Google is doing internally that isn't necessarily available through their API demos and samples. Below you can find two links to download either the stand-alone converter or the collected output as run over every APK file found in the phone's /system directory:

See next post: axml2xml.pl
Attached: android-xmldump.zip


All of the android.R resources can be found in the in [font=Lucida Console]android-xmldump.tar.gz[/font] file in the framework-res folder.

/plusminus for anddev.org
Attachments
android-xmldump.zip
Android System XML-Files. 5 MB !
(4.79 MiB) Downloaded 1598 times
Image
Image | Android Development Community / Tutorials
User avatar
plusminus
Site Admin
Site Admin
 
Posts: 2688
Joined: Wed Nov 14, 2007 8:37 pm
Location: Schriesheim, Germany

Top

Postby plusminus » Fri Mar 21, 2008 8:15 pm

Josh's complete Perl-Script:
Syntax: [ Download ] [ Hide ]
Using perl Syntax Highlighting
  1. #!/usr/bin/perl
  2.  
  3. ###############################################################################
  4.  
  5. ##
  6.  
  7. ## Copyright (C) 2008 Josh Guilfoyle <jasta@devtcg.org>
  8.  
  9. ##
  10.  
  11. ## Quick hack to reverse engineer Android's binary XML file format.  It is
  12.  
  13. ## quite crude and much data is discarded in the format because I did not
  14.  
  15. ## understand it's meaning.  It seems to correctly parse all of the XML
  16.  
  17. ## files distributed with the Android SDK however.
  18.  
  19. ##
  20.  
  21. ## This program is free software; you can redistribute it and/or modify it
  22.  
  23. ## under the terms of the GNU General Public License as published by the
  24.  
  25. ## Free Software Foundation; either version 2, or (at your option) any
  26.  
  27. ## later version.
  28.  
  29. ##
  30.  
  31. ## This program is distributed in the hope that it will be useful, but
  32.  
  33. ## WITHOUT ANY WARRANTY; without even the implied warranty of
  34.  
  35. ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  36.  
  37. ## General Public License for more details.
  38.  
  39. ##
  40.  
  41. ###############################################################################
  42.  
  43.  
  44.  
  45. use strict;
  46.  
  47. use Unicode::String qw(utf16le);
  48.  
  49. use Data::Dumper;
  50.  
  51.  
  52.  
  53. ###############################################################################
  54.  
  55.  
  56.  
  57. # Tweak this if you want to see all the annoying debug output.  Writes to
  58.  
  59. # STDERR so you can use this in combination with piped XML output.
  60.  
  61. my $DEBUG = 0;
  62.  
  63.  
  64.  
  65. my $TAG_OPEN = 0x10;
  66.  
  67. my $TAG_SUPPORTS_CHILDREN = 0x100000;
  68.  
  69. my $TAG_TEXT = 0x08;
  70.  
  71.  
  72.  
  73. ###############################################################################
  74.  
  75.  
  76.  
  77. local $/;
  78.  
  79. my $data = <>;
  80.  
  81. my $doc = { data => $data, pos => 0 };
  82.  
  83.  
  84.  
  85. ###############################################################################
  86.  
  87.  
  88.  
  89. # Some header, seems to be 3000 8000 always.
  90.  
  91. my @magic = unpack('vv', read_doc($doc, 4));
  92.  
  93.  
  94.  
  95. # Total file length.
  96.  
  97. my $length = unpack('V', read_doc($doc, 4));
  98.  
  99. debug("length=$length\n");
  100.  
  101.  
  102.  
  103. # Unknown, always 0100 1c00
  104.  
  105. my @unknown1 = unpack('vv', read_doc($doc, 4));
  106.  
  107.  
  108.  
  109. # Seems to be related to the total length of the string table.
  110.  
  111. my $tlen = unpack('V', read_doc($doc, 4));
  112.  
  113.  
  114.  
  115. # Number of items in the string table, plus some header non-sense?
  116.  
  117. my $strings = unpack('V', read_doc($doc, 4));
  118.  
  119. debug("strings=$strings\n");
  120.  
  121.  
  122.  
  123. # Seems to always be 0.
  124.  
  125. my $unknown2 = unpack('V', read_doc($doc, 4));
  126.  
  127.  
  128.  
  129. # Seems to always be 1.
  130.  
  131. my $unknown3 = unpack('V', read_doc($doc, 4));
  132.  
  133.  
  134.  
  135. # No clue, relates to the size of the string table?
  136.  
  137. my $unknown4 = unpack('V', read_doc($doc, 4));
  138.  
  139.  
  140.  
  141. # Seems to always be 0.
  142.  
  143. my $unknown5 = unpack('V', read_doc($doc, 4));
  144.  
  145.  
  146.  
  147. # Offset in string table of each string.
  148.  
  149. my @stroffs;
  150.  
  151.  
  152.  
  153. for (my $i = 0; $i < $strings; $i++)
  154.  
  155. {
  156.  
  157.         push @stroffs, unpack('V', read_doc($doc, 4));
  158.  
  159. }
  160.  
  161.  
  162.  
  163. debug(Dumper map { sprintf('%02x', $_) } @stroffs);
  164.  
  165.  
  166.  
  167. my $strings;
  168.  
  169. my $curroffs = 0;
  170.  
  171.  
  172.  
  173. # The string table looks to have been serialized from a hash table, since
  174.  
  175. # the positions are not sorted :)
  176.  
  177. foreach my $offs (sort { $a <=> $b } @stroffs)
  178.  
  179. {
  180.  
  181.         die unless $offs == $curroffs;
  182.  
  183.  
  184.  
  185.         my $len = unpack('v', read_doc($doc, 2));
  186.  
  187.  
  188.  
  189.         my $str = read_doc($doc, ($len) * 2);
  190.  
  191.         debug("str=$str\n");
  192.  
  193.  
  194.  
  195.         # Read the NUL, we're not interested in storing it.
  196.  
  197.         read_doc($doc, 2);
  198.  
  199.  
  200.  
  201.         $strings->{$offs} = $str;
  202.  
  203.  
  204.  
  205.         $curroffs += (($len + 1) * 2) + 2;
  206.  
  207. }
  208.  
  209.  
  210.  
  211. my @strings = map { $strings->{$_} } @stroffs;
  212.  
  213.  
  214.  
  215. debugf("curroffs=%d (0x%x)\n", $curroffs, $curroffs);
  216.  
  217.  
  218.  
  219. for (my $i = 0; $i < @strings; $i++)
  220.  
  221. {
  222.  
  223.         debugf("0x%02x. %s\n", $i, $strings[$i]);
  224.  
  225. }
  226.  
  227.  
  228.  
  229. #
  230.  
  231. # OPEN TAG:
  232.  
  233. # V=tagS 0x1400 1400 V=1 V=0 {ATTR? V=7 V=attrS V=valS V=attrS|0x3<<24 V=valS 0x0301 1000 V=0x18 V=? } V=~0 V=~0
  234.  
  235. # V=1    0x0800 0000 V=0x19 0x0201 1000 V=0x38 V=7 V=~0 V=~0
  236.  
  237. #
  238.  
  239. # OPEN TAG (normal, child, 3 attributes):
  240.  
  241. # V=tagS 0x1400 1400 V=3 V=0 V=xmlns V=attrS V=valS 0x0800 0010 V=~0 V=xmlns V=attrS V=valS V=0x0800 0010 V=~0 V=xmlns V=attrS V=valS 0x0800 0003 V=valS 0x0301 1000 V=0x18? V=0x0b? V=~0 V=~0
  242.  
  243. #
  244.  
  245. # OPEN TAG (outer tag, no attributes):
  246.  
  247. # V=tagS 0x1400 1400 V=0    V=0         0x0401 1000 V=0x1c V=0    V=~0
  248.  
  249. # V=1    0x0800 0000 V=0x20 0x0201 1000 V=0x38      V=0x4  V=~0   V=~0
  250.  
  251. #
  252.  
  253. # OPEN TAG (normal, child, NO ATTRIBUTES):
  254.  
  255. # V=tagS 0x1400 1400 V=0    V=0         0x0301 1000 V=0x18 V=0x0b V=~0 V=~0
  256.  
  257. #
  258.  
  259. # CLOSE TAG (normal, child):
  260.  
  261. # V=tagS 0x0401 1000 V=0x1c V=0         V=~0
  262.  
  263. #
  264.  
  265. # CLOSE TAG (outer tag):
  266.  
  267. # V=tagS 0x0101 1000 V=0x18 V=0x0c      V=~0
  268.  
  269.  
  270.  
  271. # Looks like the string table is word-aligned.
  272.  
  273. $doc->{pos} += ($doc->{pos} % 4);
  274.  
  275. debugf("pos=0x%x\n", $doc->{pos});
  276.  
  277.  
  278.  
  279. #my $no_clue1 = read_doc($doc, 48);
  280.  
  281.  
  282.  
  283. my $no_clue2 = read_doc_past_sentinel($doc);
  284.  
  285.  
  286.  
  287. #my $nstag = unpack('V', read_doc($doc, 4));
  288.  
  289. #my $nsurl = unpack('V', read_doc($doc, 4));
  290.  
  291. #
  292.  
  293. #my $nsmap = { $nsurl => $nstag };
  294.  
  295. #my $nstags = { reverse %$nsmap };
  296.  
  297. #
  298.  
  299. #my $nsdummy = read_doc($doc, 20);
  300.  
  301.  
  302.  
  303. my $nsmap = {};
  304.  
  305.  
  306.  
  307. debugf("pos=0x%x\n", $doc->{pos});
  308.  
  309.  
  310.  
  311. my $parsed = read_meat($doc);
  312.  
  313.  
  314.  
  315. debug("All done, DUMPING XML:\n");
  316.  
  317. print "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
  318.  
  319. print_tree($parsed, 0);
  320.  
  321.  
  322.  
  323. my $nsend = read_doc($doc, 8);
  324.  
  325.  
  326.  
  327. ###############################################################################
  328.  
  329.  
  330.  
  331. sub print_tree
  332.  
  333. {
  334.  
  335.         my ($node, $depth) = @_;
  336.  
  337.  
  338.  
  339.         print "\t" x $depth;
  340.  
  341.  
  342.  
  343.         if (($node->{flags} & $TAG_TEXT) != 0)
  344.  
  345.         {
  346.  
  347.                 print $node->{name}, "\n";
  348.  
  349.                 return;
  350.  
  351.         }
  352.  
  353.  
  354.  
  355.         print '<';
  356.  
  357.  
  358.  
  359.         if (($node->{flags} & $TAG_OPEN) == 0)
  360.  
  361.         {
  362.  
  363.                 print '/';
  364.  
  365.         }
  366.  
  367.  
  368.  
  369.         print $node->{name};
  370.  
  371.  
  372.  
  373.         foreach my $attr (@{$node->{attrs}})
  374.  
  375.         {
  376.  
  377.                 if (scalar(@{$node->{attrs}}) == 1)
  378.  
  379.                 {
  380.  
  381.                         print ' ';
  382.  
  383.                 }
  384.  
  385.                 else
  386.  
  387.                 {
  388.  
  389.                         print "\n", "\t" x ($depth + 1);
  390.  
  391.                 }
  392.  
  393.  
  394.  
  395.                 $attr->{ns} and
  396.  
  397.                   printf '%s:', $attr->{ns};
  398.  
  399.  
  400.  
  401.                 printf '%s="%s"', @$attr{qw/name value/};
  402.  
  403.         }
  404.  
  405.  
  406.  
  407.         scalar(@{$node->{children}}) == 0 and
  408.  
  409.           print ' /';
  410.  
  411.  
  412.  
  413.         print ">\n";
  414.  
  415.  
  416.  
  417.         if (scalar(@{$node->{children}}) > 0)
  418.  
  419.         {
  420.  
  421.                 foreach my $child (@{$node->{children}})
  422.  
  423.                 {
  424.  
  425.                         print_tree($child, $depth + 1);
  426.  
  427.                 }
  428.  
  429.  
  430.  
  431.                 print "\t" x $depth, "</$node->{name}>\n";
  432.  
  433.         }
  434.  
  435. }
  436.  
  437.  
  438.  
  439. ###############################################################################
  440.  
  441.  
  442.  
  443. sub read_meat
  444.  
  445. {
  446.  
  447.         my $tag = read_tag($doc);
  448.  
  449.         die unless $tag;
  450.  
  451.  
  452.  
  453.         $tag->{children} = [ read_children($doc, $tag->{name}) ];
  454.  
  455.  
  456.  
  457.         return $tag;
  458.  
  459. }
  460.  
  461.  
  462.  
  463. sub read_children
  464.  
  465. {
  466.  
  467.         my ($doc, $stoptag) = @_;
  468.  
  469.         my @tags;
  470.  
  471.  
  472.  
  473.         while ((my $tag = read_tag($doc)))
  474.  
  475.         {
  476.  
  477.                 # Whitespace leaks into this, but we don't support parsing it
  478.  
  479.                 # correctly.
  480.  
  481. #               next unless $tag->{name} =~ m/[a-z]/i;
  482.  
  483.  
  484.  
  485.                 if (($tag->{flags} & $TAG_SUPPORTS_CHILDREN) != 0)
  486.  
  487.                 {
  488.  
  489.                         if (($tag->{flags} & $TAG_OPEN) != 0)
  490.  
  491.                         {
  492.  
  493.                                 $tag->{children} = [ read_children($doc, $tag->{name}) ];
  494.  
  495.                         }
  496.  
  497.                         elsif ($tag->{name} eq $stoptag)
  498.  
  499.                         {
  500.  
  501.                                 last;
  502.  
  503.                         }
  504.  
  505.                 }
  506.  
  507.  
  508.  
  509.                 push @tags, $tag;
  510.  
  511.         }
  512.  
  513.  
  514.  
  515.         return @tags;
  516.  
  517. }
  518.  
  519.  
  520.  
  521. sub read_tag
  522.  
  523. {
  524.  
  525.         my ($doc, $stoptag) = @_;
  526.  
  527.         my $tag;
  528.  
  529.         my @xmlns;
  530.  
  531.  
  532.  
  533. # Hack to support the strange xmlns attribute encoding without disrupting our
  534.  
  535. # processor.
  536.  
  537. READ_AGAIN:
  538.  
  539.         my $name = unpack('V', read_doc($doc, 4));
  540.  
  541.         debugf("tag=%s (%d) @ 0x%x\n", $strings[$name], $name, $doc->{pos});
  542.  
  543.  
  544.  
  545.         my $flags = unpack('V', read_doc($doc, 4));
  546.  
  547.         debugf("        flags=0x%08x (%d, open=%d, children=%d, text=%d)\n", $flags, $flags, $flags & $TAG_OPEN, $flags & $TAG_SUPPORTS_CHILDREN, $flags & $TAG_TEXT);
  548.  
  549.  
  550.  
  551.         # Strange way to specify xmlns attribute.
  552.  
  553.         if ($strings[$name] && $strings[$flags])
  554.  
  555.         {
  556.  
  557.                 my $ns = utf16le($strings[$name])->utf8;
  558.  
  559.                 my $url = utf16le($strings[$flags])->utf8;
  560.  
  561.  
  562.  
  563.                 # TODO: How do we expect this?
  564.  
  565.                 if ($ns =~ m/[a-z]/i && $url =~ m/^http:\/\//)
  566.  
  567.                 {
  568.  
  569.                         debug("new map: $flags => $name\n");
  570.  
  571.                         $nsmap->{$flags} = $name;
  572.  
  573.                         push @xmlns, { name => "xmlns:$ns", value => $url };
  574.  
  575.                         read_doc_past_sentinel($doc);
  576.  
  577.                         goto READ_AGAIN;
  578.  
  579.                 }
  580.  
  581.         }
  582.  
  583.  
  584.  
  585.         if (($flags & $TAG_SUPPORTS_CHILDREN) != 0 && ($flags & $TAG_OPEN) != 0)
  586.  
  587.         {
  588.  
  589.                 $tag->{attrs} = [ @xmlns ];
  590.  
  591.  
  592.  
  593.                 my $attrs = unpack('V', read_doc($doc, 4));
  594.  
  595.                 debugf("        attrs=%d\n", $attrs);
  596.  
  597.  
  598.  
  599.                 my $unknown = unpack('V', read_doc($doc, 4));
  600.  
  601.  
  602.  
  603.                 while ($attrs-- > 0)
  604.  
  605.                 {
  606.  
  607.                         my $ns = unpack('V', read_doc($doc, 4));
  608.  
  609.  
  610.  
  611.                         $ns != 0xffffffff and
  612.  
  613.                           debugf("              namespace=%s\n", $strings[$ns]);
  614.  
  615.  
  616.  
  617.                         my $attr = unpack('V', read_doc($doc, 4));
  618.  
  619.                         debugf("                attr=%s\n", $strings[$attr]);
  620.  
  621.  
  622.  
  623.                         # TODO: Escaping?
  624.  
  625.                         my $value = unpack('V', read_doc($doc, 4));
  626.  
  627.                         debugf("                value=%s\n", $strings[$value]);
  628.  
  629.  
  630.  
  631.                         my $attrflags = unpack('V', read_doc($doc, 4));
  632.  
  633.  
  634.  
  635.                         my $attr = {
  636.  
  637.                                 name => utf16le($strings[$attr])->utf8,
  638.  
  639.                                 value => utf16le($strings[$value])->utf8,
  640.  
  641.                                 flags => $attrflags,
  642.  
  643.                         };
  644.  
  645.  
  646.  
  647.                         $ns != 0xffffffff and
  648.  
  649.                           $attr->{ns} = utf16le($strings[$nsmap->{$ns}])->utf8;
  650.  
  651.  
  652.  
  653.                         push @{$tag->{attrs}}, $attr;
  654.  
  655.  
  656.  
  657.                         my $padding = unpack('V', read_doc($doc, 4));
  658.  
  659. #                       read_doc_past_sentinel($doc, 1);
  660.  
  661.                 }
  662.  
  663.  
  664.  
  665.                 read_doc_past_sentinel($doc);
  666.  
  667.         }
  668.  
  669.         else
  670.  
  671.         {
  672.  
  673.                 # There is strong evidence here that what I originally thought
  674.  
  675.                 # to be a sentinel is not ;)
  676.  
  677.                 my $whatever = unpack('V', read_doc($doc, 4));
  678.  
  679.                 my $huh = unpack('V', read_doc($doc, 4));
  680.  
  681.  
  682.  
  683.                 read_doc_past_sentinel($doc);
  684.  
  685.         }
  686.  
  687.  
  688.  
  689.         $tag->{name} = utf16le($strings[$name])->utf8;
  690.  
  691.         $tag->{flags} = $flags;
  692.  
  693.  
  694.  
  695.         return $tag;
  696.  
  697. }
  698.  
  699.  
  700.  
  701. ###############################################################################
  702.  
  703.  
  704.  
  705. sub read_doc
  706.  
  707. {
  708.  
  709.         my ($doc, $n) = @_;
  710.  
  711.  
  712.  
  713.         (length($doc->{data}) - $doc->{pos}) < $n and
  714.  
  715.           die "Not enough data to read $n bytes at $doc->{pos}.\n";
  716.  
  717.  
  718.  
  719.         my $data = substr($doc->{data}, $doc->{pos}, $n);
  720.  
  721.         $doc->{pos} += $n;
  722.  
  723.  
  724.  
  725.         return $data;
  726.  
  727. }
  728.  
  729.  
  730.  
  731. sub peek_doc
  732.  
  733. {
  734.  
  735.         my ($doc, $n) = @_;
  736.  
  737.  
  738.  
  739.         my $data = read_doc($doc, $n);
  740.  
  741.         $doc->{pos} -= $n;
  742.  
  743.  
  744.  
  745.         return $data;
  746.  
  747. }
  748.  
  749.  
  750.  
  751. sub read_doc_past_sentinel
  752.  
  753. {
  754.  
  755.         my ($doc, $count) = @_;
  756.  
  757.  
  758.  
  759.         my $pos = $doc->{pos};
  760.  
  761.  
  762.  
  763.         # Read to sentinel.
  764.  
  765.         while ((my $word = read_doc($doc, 4)))
  766.  
  767.         {
  768.  
  769.                 last if unpack('V', $word) == 0xffffffff;
  770.  
  771.         }
  772.  
  773.  
  774.  
  775.         my $n = 1;
  776.  
  777.  
  778.  
  779.         # Read past it.
  780.  
  781.         if (!defined($count) || $count < $n)
  782.  
  783.         {
  784.  
  785.                 while ((my $word = peek_doc($doc, 4)))
  786.  
  787.                 {
  788.  
  789.                         last unless unpack('V', $word) == 0xffffffff;
  790.  
  791.  
  792.  
  793.                         read_doc($doc, 4);
  794.  
  795.                         $n++;
  796.  
  797.  
  798.  
  799.                         last if (defined($count) && $count >= $n);
  800.  
  801.                 }
  802.  
  803.         }
  804.  
  805.  
  806.  
  807.         debugf("[skipped %d sentinels, %d bytes]\n", $n, $doc->{pos} - $pos);
  808.  
  809. }
  810.  
  811.  
  812.  
  813. ###############################################################################
  814.  
  815.  
  816.  
  817. sub debug($;@)
  818.  
  819. {
  820.  
  821.         print STDERR @_ if $DEBUG;
  822.  
  823. }
  824.  
  825.  
  826.  
  827. sub debugf($;@)
  828.  
  829. {
  830.  
  831.         printf STDERR @_ if $DEBUG;
  832.  
  833. }
  834.  
  835.  
Parsed in 0.057 seconds, using GeSHi 1.0.8.4
Image
Image | Android Development Community / Tutorials
User avatar
plusminus
Site Admin
Site Admin
 
Posts: 2688
Joined: Wed Nov 14, 2007 8:37 pm
Location: Schriesheim, Germany

Postby haitian » Tue Apr 08, 2008 7:41 am

I have download the file axml2xml.pl , and already install perl5.6.1 for win32.

but when I run it in command line like this:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
  1. D:AndroidAndroid_toolsaxml2xml>axml2xml.pl AndroidManifest.xml
Parsed in 0.031 seconds, using GeSHi 1.0.8.4


it return the errors:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
  1. Can't locate Unicode/String.pm in @INC (@INC contains: C:/Perl/lib C:/Perl/site/
  2.     lib .) at D:AndroidAndroid_toolsaxml2xmlaxml2xml.pl line 24.
  3.     BEGIN failed--compilation aborted at D:AndroidAndroid_toolsaxml2xmlaxml2xml.
  4.     pl line 24.
Parsed in 0.034 seconds, using GeSHi 1.0.8.4


I search the error in goople, and find someone say it should down a packge

Unicode-String-2.09

but it like a Linux package, I don't know how to install it in window XP OS.

who can help me~ Thank you~
haitian
Freshman
Freshman
 
Posts: 4
Joined: Tue Jan 22, 2008 9:33 am

Postby plusminus » Fri May 16, 2008 7:56 pm

I just experienced how useful this is :)
Helps a lot in replacing System-Application, by stealing their AndroidManifest-codes :lol:

Example: :arrow: viewtopic.php?t=2105
Image
Image | Android Development Community / Tutorials
User avatar
plusminus
Site Admin
Site Admin
 
Posts: 2688
Joined: Wed Nov 14, 2007 8:37 pm
Location: Schriesheim, Germany

Top

Return to News & Rumors

Who is online

Users browsing this forum: No registered users and 3 guests