Friday

Wednesday

Global identifier of Internet adult content

[Story]
Back in May'05 and before starting my Network on Chip Master Thesis I proposed another type of project as my master thesis. The topic had nothing in common with my actual field but I though I was able to come out with a better solution to identify Internet adult content, but not to censure it. The project was rejected because telematic is not a strong research field in the ULPGC -which are microelectronics and signal processing- and the tutors I tried to convince were more focused in low-quality and short term profitable projects. While I can understand that short term profits are really important for a company, I think university research should be long-term high quality research.
Since at that time I did not have a blog I had this idea in a drawer until today ;-). Be patient and don't expect a high-quality analysis, because my main notes are back in Canary island and I am writing as I remember things from here in Netherlands.
The idea was simple. There are a lot of entities spending money in banning adult-content webpages, from search engine filters as Google to adult-control filter for browsers. To be honest I understand this situation, since I have a little sister (6 or 7 years old at the time I studied the project) who uses internet. Thus, I was afraid about how adult content could shock a child at an age when they are not able to process these kind of things no matter what you teach them.
[Main idea]
The idea of the project is to change the view from an extremely cost individual banning to a cheap global tagging. To do so the proposal is a train ticket model in which the webmaster tells to the tagging entity whether or not the webpage/domain is an explicit adult-content. The tagging entity trusts the webmater's tagging, still there is a revision process which is focused on alerted webpages and if an adult content page was tagged as a non-adult content page, the webmaster will be required to pay a dissuasive price in order to retain its domain name.

What's the mystery? The global idea is simple but not the implementation since right now there is a lot of complexity in the management of the domain names so let's take a look to this complexity.

The Historical Context
-1993, Network Solutions, Inc. (NSI) was granted an exclusive contract by the National Science Foundation (NSF) to be the sole Domain name registrar for .com, .net and .org Top Level Domain (TLD) names. NSI also maintained the central database of assigned names called WHOIS. Network Solutions acted as a de facto registrar, selling names directly to end users.
-1998, on January 28, Postel, on his own authority, emailed eight of the twelve operators of Internet's regional root servers and instructed them to change the root zone server from Network Solutions (NSI)'s A.ROOT-SERVERS.NET. (198.41.0.4) to DNSROOT.IANA.ORG (198.32.1.98). The operators complied with Postel's instructions, thus splitting control of Internet naming between IANA and the four remaining U.S. Government roots at NASA, the .mil server, BRL and NSI. He soon received a telephone call from a furious Ira Magaziner, President Clinton's senior science advisor, who instructed him to undo this change - which he did. Within a week, the US NTIA issued its "Green Paper" asserting the US government's definitive authority over the Internet DNS root zone.
-In 2000, Network Solutions(NSI) was acquired by VeriSign $21bill
-2003, In negotiations with ICANN, VeriSign gave up operation of the .org top-level domain in return for continued rights over .com, the largest domain with more than 34 million registered domain names.
-In mid-2005, the existing contract for the operation of .net expired and five companies, including VeriSign, bid for management of it. On the 8th of June 2005 ICANN announced that VeriSign had been approved to operate .net until 2011.

Light in the craziness of the domain names.
The key element is to work with the WHOIS database. Thus, the WHOIS database manager, as the tagging entity, should add a new task: To examine with a final human report, but with a computerize process whether or not a webmaster follows its web tag, and if the webmaster does not follow it, to increase the fee of 6$/year to keep that domain name. Thus, it does not matter which company is the registrar, and with the extra money the WHOIS database manager could afford to do a detailed human checking of each web page that has an alert. In this system it is key that search engines or other companies that crawl/index the web send alerts of suspected pages to the tagging entity. Otherwise, the tagging entity has to build a crawler by itself, but it is easy and cheaper to reward each successful alert made by the search companies. Therefore the tagging entity would value the quality of the alerts from different suppliers(I am thinking in google as the most capable) and would make a weighted queue of pending alerts taking into account the supplier history of success.
This is the main idea, but there are a lot of elements to research, for example, I have thought in 6 tags:
1) explicit adult-content.
2) unclassified(reasonable alert system and no extra payment.) Change the status to explicit adult content if that is the situation. It will be predetermined status at the beginning.
3) non-adult explicit(soft alert system, extra payment and change of tag if needed). Adult content sites are 80/100% explicit adult images so with this intermediate tag it should be extremely difficult to make a false alert in global content pages as http://www.nytimes.com/, art pages, blogs or any other "normal" web page with less than 10/30% of explicit adult pictures.
4) Explicit child-content(high alert system, extra payment and change of tag if needed).
5) Multidomain. Here is the biggest problem of the project. Example: www.ulpgc.com/USERS/John. Also, it is the part where there is space to design and implement a solution.
6) Untrustable multidomain.

Therefore the idea of the project was to clarify all this scheme and implement a small prototype, so the project tasks are as follows:
[Main Tasks]
Main task 1:
-Global solution for simple domains. Study tags. Add element to Resource Record. Create secure policy to access the tag or use WHOIS protocols.

Main task 2:
-Multi-domain: Example: iuma.ulpgc.es/USERS/alumnos/JohnTravolta

Propose a special tag(Multidomain) to delegate responsibility to a local server.
Create a secure protocol to connect a tag query to a WHOIS database with an answer from the local server which manages that subdomain.
Protocol for communication between the WHOIS database manager and the local server, which main goal is to change the local server tag status from the WHOIS database manager. If the local server does not follow this protocol to change a tag, the WHOIS database manager will set the status of such multi-domain to untrustable multidomain and there will be an economic sanction when the multidomain renews the domain name.
Main task 3:
Create prototypes of all these elements and study ef
ficiency and security of proposed protocols. Study how the tags are propagated along the DNS servers (Times, efficiency, etc.).
[FAQ]
So this is, as far as I remember it, the draft of the rejected project. I understand you might have some questions, so here I write my answers to the typical questions:
Q1 Multidomain is key. A1: Yes is key and that's why this project is interesting.
Q2 DNS is not only about WEB. A2: Other services can be tagged as unclassified.
Q3 It will be like censure. A3: The idea is to be flexible and just to improve, not to be a policeman. The idea is that you put the filter in your browser, is never put by an ISP or a government. NEVER. If it is not possible to guarantee that China won't use this tags to censure is better to close this project forever.
Q4 People can access adult-content directly through ip protocol jumping the DNS. A4: AGAIN, the idea is not to censure, the idea is to improve search engines, child filters for browsers and the typical click access from one page to another. Thus, only if the final user wants to filter the tags will be used(You can always access whatever you want or, on the other hand you can filter adult tagged pages from your browser. You make the decision. For example if your are a porn addict you can make a search engine to search only pages with the tag porn. This project is not about censure, is about efficiency).
Q5 Once this project prototype is made is more about politics than about engineering. A5: Absolutely true.
Q6 I don't see money to support the final human reports. A6: The domain cost will increase if users lie about their tags. That money should support the human report cost, but is important to have an automated engine to make trusted alerts(As I said I think google and other search engines might have a good one.)
Q7 It is dangerous to have an entity controlling web content. A7: It is not web content control, is tagging identification available to the final user. In addition it is only adult/non-adult tagging, and there is a soft detection (sites with less than 30% adult-content) to make 0 false positives. At the same time the point is to support net neutrality; I can see some extremist "family" groups clamming the ISP to filter tags with adult-content and big Washington lobbies(Don't forget that ICANN is controlled by USA) pressing USA congress to make more tags and to change the soft detection(over 30% adult content) with a hard&stupid detection(over 0,1% of adult content). To answer this, I can say that we need to be strong supporters of net neutrality, and we will need that support with this project and without it, because in the next years we(as the Internet community) will be facing a lot of stupidity/insanity coming from lobby groups.

The Italian Man Who went to Malta



Besides Italian, Spanish guys also fit in the video ;-).

Monday

Nautilus script line count in Ubuntu Gibbon 7.10

[Story]
I was in an interview two weeks ago and the interviewer popped up an unexpected question.
Interviewer- How many code lines does your Master Thesis Project have?.
Ray-"To be honest I have NO IDEA". (My NoC Simulator is a modular system with around 125 source files)
Interviewer-But can you give me an approximation?
Ray-"I have not a clue" [(You should view his face). I tried to solve the situation by giving him the CD with the project, but he didn't take it so I guess I lost one job opportunity.(The interesting thing is that I believe this department needs me more than I need them. There are not many engineers with the background, passion and enthusiasm this research topic needs, and I have all of them ;-). Life is just like that, my experience tells me that some times when you follow a different path that the one you were planning the result is much better than the initial plan.)]

[Problem]
What do we do to count the number of code lines along many files and folders?.
[Solution]
A Nautilus script.
I searched internet and I found an script package from "Nicolas Cuntz (ni_ka_ro), 16.4.2005" with a line count script implemented, but since it is an old implementation it does not work with the last version of nautilus and bash(or at least it does not work for me).

I have updated some lines to solve the main problems, and right now I have a working solution. It is important to point out that there is an error in the execution, but does not affect the counting. In the future I will solve it and I will make a clean script but in the mean time you can download this functional version here.

-How to install&use it
Extract it and move the files to the folder /home/user/.gnome2/nautilus-scripts/
Don't forget to move the hidden folder '.scripts' to the same destination.
Give execution permission to all the files inside .scripts an also to 'line_count' script in /home/user/.gnome2/nautilus-scripts/
That's all. Select the folder with the project. Right click->Scripts->line_count.

Yes I know, the recursion along folders is just great.
I hope you find it useful and don't forget to reply me if you have a better solution.