Abstract— Nowadays web
mining has gained more attention of users with its interfaces and large quantity
of knowledge on the market. This has earned users interest in searching plenty
of useful data but it is still restricted with the numeral of the resources extraction like unlabeled photos. This
paper gives the framework for automated face identification task by taking the
advantage of contented-based image
retrieval (CBIR) and search based image retrieval (SBIR) schemes in drawing out the huge
assortment of poorly labeled photos on the internet. Since the images are poorly
labeled, it will be complicated to recognize
the similar images, so to recognize the poorly labeled similar images; we have
proposed updated unsupervised label
refinement (ULR) approach. Search can be done based on the name of image or the image itself, if the match is found in the unit, then the similar images are displayed otherwise the
output is null. Cluster analysis is used
to group the similar images. Association analysis is used to count of the images that
are calculated based on the number of
times the images are searched.
Keywords- Face annotation,
Web mining, Face detection, Indexing, Association analysis, Cluster analysis.
Data Mining has become more
important in society due to the more quantity of data and changing such data into the useful information and
knowledge. Ex Mining or drawing out knowledge
from huge collection of data is
called Data mining. The main aim
of data mining is to excavate information from the set of data and translate it into an comprehensible structure for future use.
Data mining is one among the knowledge discovery process. Knowledge discovery has series of steps as: Data cleaning, Data integration, Data selection, Data
transformation, Data Mining, Pattern evaluation, Knowledge presentation. It uses techniques
that are used to extract data patterns.
Data mining system has
engine which comprises of set of functions for chores such as characterization, association and
correlation analysis, classification, prediction. Nowadays web mining has gained more awareness for users with its interfaces and large quantity
of knowledge on the market. Extracting patterns that are accessed by the users
in distributed information environment is called Web mining, Web search based on the single keyword may outputs hundreds of
web page links containing the keyword, but most of the links will be weakly
related to which the user want to search.
Patterns leads to the discovery of
attractive associations. Frequent
patterns are the patterns which occur frequently. Market bin analysis is an example of frequent itemset.
Association analysis is the process which is used to find interesting relationship hidden in large amount of data.
Association analysis are used to cover relationship among allied data in the database, relational database or other information storehouse.
Association rules are used to find the relationships between the objects which are frequently used together. Applications of association rules are bin data
analysis, classification, cross-marketing, clustering, catalog design, and
loss-leader analysis etc.
In this paper we are using item sets as images where
related images are displayed based on content based (image itself ) and image
name based. For example, if the
customer buys rice then he may also buy
dhal. If the customer buys mobile then he may also buy memory card.
There are two measures that association rules uses, support and .confidence. It
identifies the relationships generated by analyzing data for
frequently used patterns.
support and a User –specified minimum confidence at the same time are
explored by Association rules.
Nowadays with the use of various digital
cameras and the speedy growth of societal media for internet-based photo allocation, recent years have witnessed an outburst of
the number of digital photos captured and stored by users.
Major concern that has to be taken care is the recognition of
images that is to identify or verify the images using the database where the
images are stored. Image recognition is an vital part of the capability of human acuity system. The initial work on image detection can be traced back at least
to the 1950s in psychology and to the 1960s in the engineering literature.
Some of the most basic studies include efforts
on facial expression of emotions by Darwin. Later many concepts were used in the recognition
of images such as identification number, race, age, gender, facial expression, or speech that are
used in narrowing the search (enhancing recognition).
The remedy to the problem involves segmentation
of faces (face detection) from cluttered scenes, feature extraction from the
face regions, recognition, verification
and also indexing may be applied on images. In identification problems,
the input to the system may be given as image or the name of the image, and the
system outputs the similar images
from a database of known
individuals or else outputs null, whereas in verification problems, the system
needs to confirm or reject the identity of the input image.
In most cases photos shared by users on the web are facial images. Some facial images are label with names, some may be weakly
labeled and some are not labeled properly. This motivated to an important
technique that is to find facial
images automatically. This can be useful to many
applications on web and online photo-sharing sites can automatically labels
user uploaded photos to provide online photo search A method is presented for
giving label to facial image by mining the web, where a massive amount of feebly labeled images are available freely in internet.
This aims to the automated face annotation(identification) task by
taking the advantage of content-based image
retrieval (CBIR) and search based image retrieval (SBIR) techniques in drawing the large
amount of poorly labeled images on the internet. This scaffold is model-free and data-driven. The main motives of these
schemes are to assign correct name
labels to a given image query.
For given a novel facial image for
annotation, first we have to retrieve a short list of top n most same facial image pixels from a
poorly labeled facial image database, and then annotate the facial
image by the names(labels) associated with the top n facial images of same pixel value(binary value). One challenge faced by CBIR and SBIR techniques is how to effectively identify and to short list similar facial images and their feeble
labels for the face name annotation task. To solve this, we use a
novel updated unsupervised label
refinement (ULR) scheme by considering machine learning techniques.
We also propose Cluster
based approximation algorithm (CBA) and Association rule based approximation
(ABA) algorithm to improve the efficiency. We can also provide facility
to search similar images by giving input in the form of image.
II. LITERATURE SURVEY
Dayong Wang, Steven C.H.
Hoi, Ying He, and Jianke Zhu
has proposed – Mining Weakly Labeled Web
Facial Images for Search-Based Face Annotation gives a
scaffold of search-based face annotation (SBFA) by mining imperceptibly labeled
facial images that are freely available on the World Wide Web (WWW). This mainly exploits the catalog of most comparable facial
images and their labels that are noisy that uses unsupervised label refinement (ULR)
approach for refining the labels of web facial images using machine learning
Zhong Wu, Qifa Key, Jian
Suny, Heung-Yeung Shumy has proposed – Scalable Face Image Retrieval with Identity-Based
Quantization and Multi-Reference Re-ranking, which aims to build a scalable face image retrieval system and develops a new scalable face depiction using
both local and global features.
In the indexing stage, exploits extraordinary properties of faces to design new
component-based local features, which are consequently quantized into visual
words using a novel identity-based quantization scheme.
Preeti Chouhan, Mukesh
Tiwari has proposed – Feature Extraction Techniques for Image
Retrieval Using Data Mining and Image Processing Techniques provides with a basic informatory review on the applied fields of data mining which is varied into manufacturing, telecommunication,
education, fraud detecting and marketing sector. Includes various methods like clustering, correlation, association and
neural network and also provides concepts on Image mining. Image mining deals
with association of image facts and extraction of hidden data.
T.Geetha has proposed- Fast Frequent Pattern
Mining Using Vertical Data Format for Knowledge Discovery, provides Apriori based schemes, Frequent Pattern growth
(FP-growth) and Equivalence CLASS Transformation (ECLAT) are the extensively
used approaches used in haul out frequent patterns. Also quantitative investigation of
changing the format stream is done for better result in less computational
A. Existing System
In the Existing system, object recognition schemes is used to train classification models from human-tagged training images or attempt to show the link between annotated keywords and images.
Given limited training
data, semi-supervised learning
methods have been used for image
identification in classical classification models. Limitations: 1. Similar
clear Images were not displayed using local binary system. 2. Poorly appeared images or poorly labeled
images are difficult to identify. 3. Always produces approximate results. 4.
There was no ranking (count) scheme.
B. Proposed System
This paper mainly gives a framework for search-based and content based image retrieval techniques by mining weakly named images that are available . Since the images are poorly labeled, it will be difficult to identify the similar images. So to identify the poorly labeled similar images, we have
proposed updated unsupervised label
refinement (ULR) approach .
To perform search on images
we are using ULR algorithm having the binary format of the images. Search can be done based on the name of image or the image itself, if the match is found in the unit then the similar images are displayed otherwise the
output is null. Grouping of images are done using cluster approximation. Also
count of the images are monitored based on user clicks for the respective
images which is searched.
Advantages: 1. Similar
Clear Images were retrieved based
on image itself or the name of the
image. 2. Easy to retrieve the images since the names are given to the
images. 3. Produces accurate results. 4. There is ranking (count) scheme based on the number of times the user searched for particular image We
have 4 important modules in this process: Labeling Images: Images are uploaded
by giving label(name) to the images.
Retrieval: In this module, input is given as image and outputs group of images that are similar to the input image else outputs null. Query by image content (QBIC) method is used. Search based Image Retrieval: In this module, input is given as name and outputs group of images that are similar to the input name else outputs
Query by image name(QBNC)
method is used. Ranking Scheme: Count of the respective images that are searched are recorded. C. Architecture /
Figure 1 Figure 1 illustrates the system flow of the proposed framework of search-based
face annotation, which consists of the following steps: 1.Collection of images, Labeling
and Storing 2. Detection and Feature
Extraction based on the input 3.
Performing Indexing and
Collect the labeled data using the URL technique 4. Face
annotation where similar Images are retrieved using Cluster analysis 5. Face annotation by ranking scheme using association
analysis The first 3 steps are usually conducted before the test phase of a
Image identification task, while the last two steps are conducted during the
test phase of a Image identification task, which usually should be done very
We briefly describe each step below. The firstly, is the data collection of facial images as shown in Figure 1, in which we collect the images by
Google search engine. Given the nature of web
images, the images may be noisy, which do not always correspond
to the correct name and such
images are weakly or poorly labeled
Secondly, is to detect and extract the feature of images , we use the unsupervised face
alignment technique proposed in 4. For
facial feature representation, we extract the GIST texture features 5 to represent the extracted faces. Thirdly, is indexing the extracted features of the images by applying some efficient high-dimensional indexing
So for this, we use the locality sensitive hashing (LSH) 6 and unsupervised learning scheme is used to enhance the label
quality of the weakly labeled facial images which is important in the search process. The first 3
steps are the phases involved in updated ULR algorithm. Fourthly, is grouping
of related images (K similar images) using cluster approximation algorithm.
Last step is Face annotation by counting the user clicks based on user
search of particular image and this is done using association analysis.
IV. ALGORITHMS Updated ULR
algorithm Input: Image Output: Similar Images/ NULL Begin Collection of images,
Labeling and Storing Detection and Characteristic
Extraction based on the input Performing Indexing and Collect the labeled data using the URL technique End Cluster
Based Approximation Algorithm: Numeral
of variables in the extracted
image feature are a * b. Where a= number of facial images in the retrieval database. b= number of distinct names.
In this paper strategy
could be applied in two different phases: Image retrieval based on 1. One is on
“image itself,” which can be used to separate all the ‘a’ facial images into similar
group 2. Second one is on “image name,” which can be used to separate the ‘b’
names into a group.
Then based on the input which is given the similar images
of respective cluster are displayed. In this paper k-NN clustering technique is used for clustering the images. The k-Nearest Neighbors algorithm
(k-NN) is used for classification and regression. In K-NN,
the input consists of the k closest sample data. In k-NN
classification, the output is a class member.
An object is classified by
a popular counting of its neighbor point . If k = 1, then the object is
assigned to the class of that one nearest neighbor. The property value is the
object in k-NN regression. This value is the average of the value of its k nearest neighbors. Association
Based Approximation Algorithm: Here, based on the image retrieval the count is monitored for every user
clicks on the images and that count will be reflected in the clusters for the
In this paper a search-based image retrieval and contented-based image
retrieval techniques are used to mine the huge quantity of poorly labeled images that are liberally available on the web. It uses a updated ULR algorithm to identify the images, Cluster approximation method to
group the similar images for scalability and Association analysis method used to scrutinize the number of times the
particular image is been searched. All
these methods improve the performances and also scalability without degrading the system performance. Future
enhancement can be made on retrieval of images based on time.
1 Dayong Wang, Steven C.H. Hoi,Ying He, and
Jianke Zhu,”Mining Weakly Labeled Web Facial Images for Search-Based Face
Annotation”, IEEE Transactions on Knowledge and Data Engineering, vol. 26, no.
1, January 2014
Chouhan, Mukesh Tiwari, “Feature Extraction Techniques for Image
Retrieval Using Data Mining and Image Processing Techniques”, IJARCCE Vol. 5, Issue
5, May 2016
T.Geetha, “Fast Frequent Pattern Mining Using
Vertical Data Format for Knowledge Discovery”, International Journal of Emerging Research in Management
&Technology ISSN: 2278-9359 (Volume-5, Issue-5), May 2016
J. Zhu, S.C.H. Hoi, and L.V. Gool, “Unsupervised Face Alignment by Robust Non rigid
Mapping,” Proc. 12th Int’l Conf. Computer Vision (ICCV), 2009.
C. Siagian and L. Itti, “Rapid Biologically-Inspired Scene Classification Using
Features Shared with Visual Attention,” IEEE Trans, Pattern Analysis and
Machine Intelligence, vol. 29, no. 2, pp. 300-312, Feb. 2007.
W. Dong, Z. Wang, W. Josephson, M. Charikar, and K. Li, “Modeling LSH for
Performance Tuning,” Proc. 17th ACM Conf.Information and Knowledge Management
(CIKM), pp. 669-678, 2008.
7 J. Zhu, S.C.H. Hoi, and M.R. Lyu, “Face
Annotation Using Transductive Kernel Fisher Discriminant,” IEEE Trans.
Multimedia, vol. 10, no. 1, pp. 86-96, Jan. 2008.
8 A.W.M. Smeulders, M. Worring, S. Santini,
A. Gupta, and R. Jain, “Content-Based Image Retrieval at the End of the Early
Years,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12,
pp. 1349-1380, Dec. 2000.