[ale] Text Processing Happiness - I'm lost

David Tomaschik ozone at webgroup.org
Sat Aug 18 01:49:22 EDT 2007


Bruce wrote:
> Hey all, it's been a while since I was on the Ale list
> - but I have a question, and figured this is the best
> place to ask.
>
> I am running a Netflow Collector (NFC5.0.2) and have a
> config file in XML. The config file basically
> associates applications with TCP and UDP ports. Since
> the config file is pretty limited, most of my traffic
> is not getting associated correctly.
>
> I pulled down a listing of well-known and registered
> ports from IANA, figuring on taking the scattershot
> approach. 
>
> A short section is here:
> "<case><value>	1	</value><label>	TCP_	tcpmux	-	1	-tcp
> </label></case>"
> "<case><value>	2	</value><label>	TCP_	compressnet	-	2
> -tcp	</label></case>"
> "<case><value>	3	</value><label>	TCP_	compressnet	-	3
> -tcp	</label></case>"
> "<case><value>	5	</value><label>	TCP_	rje	-	5	-tcp
> </label></case>"
> "<case><value>	7	</value><label>	TCP_	echo	-	7	-tcp
> </label></case>"
> "<case><value>	9	</value><label>	TCP_	discard	-	9	-tcp
> </label></case>"
> "<case><value>	11	</value><label>	TCP_	systat	-	11
> -tcp	</label></case>"
> "<case><value>	13	</value><label>	TCP_	daytime	-	13
> -tcp	</label></case>"
> "<case><value>	17	</value><label>	TCP_	qotd	-	17	-tcp
> </label></case>"
> "<case><value>	18	</value><label>	TCP_	msp	-	18	-tcp
> </label></case>"
> "<case><value>	19	</value><label>	TCP_	chargen	-	19
> -tcp	</label></case>"
> "<case><value>	20	</value><label>	TCP_	ftp-data	-	20
> -tcp	</label></case>"
> "<case><value>	21	</value><label>	TCP_	ftp	-	21	-tcp
> </label></case>"
> "<case><value>	22	</value><label>	TCP_	ssh	-	22	-tcp
> </label></case>"
> "<case><value>	23	</value><label>	TCP_	telnet	-	23
> -tcp	</label></case>"
> "<case><value>	25	</value><label>	TCP_	smtp	-	25	-tcp
> </label></case>"
> "<case><value>	27	</value><label>	TCP_	nsw-fe	-	27
> -tcp	</label></case>"
> "<case><value>	29	</value><label>	TCP_	msg-icp	-	29
> -tcp	</label></case>"
> "<case><value>	31	</value><label>	TCP_	msg-auth	-	31
> -tcp	</label></case>"
> "<case><value>	33	</value><label>	TCP_	dsp	-	33	-tcp
> </label></case>"
> "<case><value>	37	</value><label>	TCP_	time	-	37	-tcp
> </label></case>"
> "<case><value>	38	</value><label>	TCP_	rap	-	38	-tcp
> </label></case>"
> "<case><value>	39	</value><label>	TCP_	rlp	-	39	-tcp
> </label></case>"
> "<case><value>	41	</value><label>	TCP_	graphics	-	41
> -tcp	</label></case>"
> "<case><value>	42	</value><label>	TCP_	name	-	42	-tcp
> </label></case>"
> "<case><value>	42	</value><label>	TCP_	nameserver	-	42
> -tcp	</label></case>"
> "<case><value>	43	</value><label>	TCP_	nicname	-	43
> -tcp	</label></case>"
> "<case><value>	44	</value><label>	TCP_	mpm-flags	-	44
> -tcp	</label></case>"
>
> And what I want it to look like is here:
> <case><value>1</value><label>TCP_tcpmux-1-tcp</label></case>
> <case><value>2</value><label>TCP_compressnet-2-tcp</label></case>
> <case><value>3</value><label>TCP_compressnet-3-tcp</label></case>
> <case><value>5</value><label>TCP_rje-5-tcp</label></case>
> <case><value>7</value><label>TCP_echo-7-tcp</label></case>
> <case><value>9</value><label>TCP_discard-9-tcp</label></case>
> <case><value>11</value><label>TCP_systat-11-tcp</label></case>
> <case><value>13</value><label>TCP_daytime-13-tcp</label></case>
> <case><value>17</value><label>TCP_qotd-17-tcp</label></case>
> <case><value>18</value><label>TCP_msp-18-tcp</label></case>
> <case><value>19</value><label>TCP_chargen-19-tcp</label></case>
> <case><value>20</value><label>TCP_ftp-data-20-tcp</label></case>
> <case><value>21</value><label>TCP_ftp-21-tcp</label></case>
> <case><value>22</value><label>TCP_ssh-22-tcp</label></case>
> <case><value>23</value><label>TCP_telnet-23-tcp</label></case>
> <case><value>25</value><label>TCP_smtp-25-tcp</label></case>
> <case><value>27</value><label>TCP_nsw-fe-27-tcp</label></case>
> <case><value>29</value><label>TCP_msg-icp-29-tcp</label></case>
> <case><value>31</value><label>TCP_msg-auth-31-tcp</label></case>
> <case><value>33</value><label>TCP_dsp-33-tcp</label></case>
> <case><value>37</value><label>TCP_time-37-tcp</label></case>
> <case><value>38</value><label>TCP_rap-38-tcp</label></case>
> <case><value>39</value><label>TCP_rlp-39-tcp</label></case>
> <case><value>41</value><label>TCP_graphics-41-tcp</label></case>
> <case><value>42</value><label>TCP_name-42-tcp</label></case>
> <case><value>42</value><label>TCP_nameserver-42-tcp</label></case>
> <case><value>43</value><label>TCP_nicname-43-tcp</label></case>
> <case><value>44</value><label>TCP_mpm-flag-44-tcp</label></case>
>
> The label is the name - I am keeping TCP_ (and UDP_)
> at the start of the label, as the tool I use to
> display stats looks for the TCP and UDP character. I
> follow the IANA name with the port and protocol so I
> won't get duplicate application names (a lot of the
> apps. listen on both UDP and TCP).
>
> Any pointers? How do I get rid of the " character? I'm
> guessing there are tabs in the file, since I created
> it using Excel(I know, I should have figured a way to
> simply grab the IANA well-known ports page and process
> it directly). How do I get rid of tabs?
>
>
>        
>   
No more quotes or tabs:

tr -d '"\t' infile > outfile

-- 
David Tomaschik
Moderator, LinuxQuestions.org
http://matir.wordpress.com




More information about the Ale mailing list