Introduction to URL Filtering
Make sure that you type the name of your favorite social networking site in the web browser and it displays a message “The policy of your organization does not allow navigation to this website” and does not allow access to it from your office. This is because URL filtering has been put in place by your IT department.
Thus, a URL filtering is used to categorize websites on the Internet and allows/blocks access to the users of your organization’s website, based on an already classified database in categories (maintained by URL filtering providers) or by real-time classification.
URL filtering can also be applied only during certain times during the day or days of the week if required.
WHY IS A URL FILTER REQUIRED?
URL filtering is required to block users of an organization from accessing websites during working hours that:
- Dramatically reduce productivity
- Allow them to see objectionable content in the workplace
- If bandwidth usage is intensive and therefore require too many resources
- Leakage of confidential or critical information
How is URL Filtering performed?
URL filtering is done by a URL Filtering provider by maintaining a highly classified database in categories with most Internet websites.
Whether it allows access to them or not-allows access to Internet users of an organization either all the time or only during a certain time during the day.
The policies of which category of websites should be allowed / not-allowed to users of an organization can be established by IT department personnel through a web-based interface offered by URL filters.
In this way, there is a local hardware appliance or a software application running on a server that connects to a central database of URL filtering providers which allows blocking individual websites.
There may be a local database, which is completely or partially updated from a central database. But updating them completely can lead to their productivity problems such as bandwidth usage or memory usage.
Some providers allow you to add URLs automatically, eliminating the need to submit them manually for inclusion in your database.
A website can be classified into a single category or multiple categories and blocking can be done properly.
For example, access to a website may be allowed if it is in the sports category but not if it is in sports and betting.
Generally, URL Filtering companies evaluate websites based on the name of their domains (in addition to URLs) since a domain can contain multiple URLs that often tend to grow.
Sub-domains also need to be classified in addition to the main domains (for blogs, etc.) and the intermediate pages need to be classified in addition to the primary pages or based on the primary pages (such as translation sites or sites that display images of other servers).
It may also be necessary to similarly evaluate websites that contain multiple languages.
CLASSIFICATION OF REAL-TIME WEBSITES
The Internet is too large and it is practically impossible to categorize the entire list of websites present.
So when the user accesses certain websites, the URL filtering system classifies them ‘on the fly’ or in real-time.
This can typically take only a few hundred milliseconds and local databases are automatically updated in conjunction with the central database.
This classification into categories is done automatically by learning machines (automated software applications such as website search robots) who retrieve the pieces/keywords (sometimes all words) from the website content and in the context decide the most appropriate category.
So also the links of the websites to other websites are analyzed to place it in the relevant category.
These learning machines are trained by professional humans by feeding them with training data (containing websites classified in categories by professional humans) and adjusting the settings to reflect the same results, over a considerable period.