基於類神經網路與使用全域與運動特徵之人形辨識系統

Neural Network-Based Human Recognition System Using Global and

Motion Features

 

 

 

   指導教授: 林進燈 教授

研究生: 徐有德

 

 

Introduction

    The detection of moving objects, especially people or vehicles, has been used widely in surveillance system. In recent years, considerable methods are proposed for detecting faces and pedestrians through supervised classification method. Almost all of these methods need to (1) segment moving objects from images, then (2) recognize them be human or non-human. When a moving object appears in the scene, the system first detects where the moving object is and figure out the position of it. Then, the system will recognize this moving object is human or not.

In our system, mainly we separate it into two part:(1) Static Human Recognition, and, (2) Dynamic Tracking. In static human recognition, first we use temporal differencing method to segment moving objects in a scene. After pre-processed, the moving object training images are normalized and sent to PCA to extract their principle components, by which we could represent each image. Finally, Back-propagation neural-network is applied to train these data and storage the weight matrix generated by it.

     In dynamic tracking, first we analyze the characteristics while human is walking in lateral and in frontage respectively, and then use these features to distinguish the object is human or human-shaped board.

 

Static Human Recognition

  This procedure could be divided into two primary parts:(1) Off-line Training, and (2) On-line Detection.Fig 1 shows the block diagram of off-line training and Fig 2 shows on-line detection. The objective of off-line training is first to select some important principle components which can represent most information of each training image, and second, using BP training to generate a weight matrix by which we can judge a moving object is human or not. In brief, all the procedure of off-line training can be simply separated into three parts: (1) moving objects acquisition and normalization, (2) Principle Components selection, and (3) Back-propagation neural network training.

                                                 

                             Fig 1. Block diagram of off-line training                                                       Fig 2. Block diagram of on-line detection

A. Off-line Training

 a. Moving Objects Acquisition and Normalization

    The ultimate goal of moving objects acquisition is to segment moving objects in a scene. The block diagram of this procedure is shown in Fig 3. First the consecutive input images are transformed to be gray level. Then we use temporal differencing method to segment moving objects from a scene and mean filter is applied here to eliminate the interference of noise. Binarizing and doing dilation twice of the processed images, and using connected components to label the pixels which may belong to the same moving objects. Then compute the area of each rectangle in which each moving object having the same labeling number is covered. If the area larger than a threshold, this image will be normalized, otherwise it is ignored.

                                                  

                Fig 3. Block diagram of moving objects acquisition and normalization

b. Principle Components Selection

    After choosing enough human and non-human normalized training samples, we apply PCA method to find out the eigenvector of them. Select only K eigenvectors corresponding to the largest K eigenvalues to be our principle components. Larger K contains more information of each image, but increases the computational time as well.

c. Multi-layer Back Propagation Neural Network Training

The objective of BP neural network training is to generate the weight matrix. Fig. 4 shows some human and non-human training samples. We use 2000 human and 800 non-human normalized images to be the training samples. After training , a weight matrix is created. This weight matrix will be used in on-line detection.

                    

                          (a) human training samples                                                                                       (b) Non-human training samples

                                                                                                                Fig 4. Training samples

 

B. On-line Detection

   While new consecutive images is imported on-line, we first do the procedure of "Moving Objects Acquisition and  Normalization", as mention in off-line training. The value of each pixel within this image is rearranged and then multiplied with principle components and weight matrix. We distinguish the moving object is human or non-human depending on the output of this result.

Dynamic Tracking

      In static human recognition, once there is a human-shaped board in an observed scene, it would be classified as human as well. In this section, we use motion features to recognize which one is human and which is not. Basically, we separate human walking style into two prototypes: (1) lateral, and (2) frontage, and use different motion features to distinguish the moving objects. Of course, the precondition is that the moving objects must be classified as human in static recognition.

A. In Lateral

        In lateral, the width of moving object is changing while it is moving. Fig 5 illustrates this phenomenon. In most situations, width variation is periodic, so as that in FFT power spectrum, it has a large peak in specific frequency. It is showed in Fig 6. So that we observe the power spectrum and find out the different value between the largest and the second largest energy, if the different value is larger than a threshold, this moving object is human. Obviously, human-shaped board has no variance while it is still.

 

                          Fig 5. Width variation in consecutive frame                                                                 Fig 6. FFT power spectrum

 

(2) In Frontage

    In frontage, we provide a new feature to distinguish human and human-shaped board. While human is walking in frontage, as Figure 9 shows, the only changing parts are hands and feet. However, while human puts his hands on pockets or carries a handbag, his hands may be covered. We choose lag to be our feature. In normalized image, the part of knee is often on the position 70 in height, and the feet part without the interference of shadow is often on 90, as showed in Fig 7 (covered by a red rectangle). Now we observe the moving of gravity. From 1 to 13 frame, this human is stepping his left foot, the gravity is moving to left (close to the origin). From 17 to 29, as human stepping his right foot, the gravity is moving toward right. In Fig 8, although some noise would disturb the situation mentioned above, the trend is still goes to the direction we assume. Also, a human-shape board doesn’t have this phenomenon.

 

    

                    Fig 7. Human walking in frontage                                                                                       Fig 8. Movement of gravity

 

Experiments

Table 1 shows the result that we use our training samples to be the testing samples. The left column shows the number of K. In recognizing human, K=89 and K=111 have 100% accuracy rate, meanwhile K=60 has 96.35%. All of them have good performance in recognizing non-human. Furthermore, we choose 215 human and 90 non-human images from new video to be the testing sample. Table 2 shows the result. In recognizing human, K=89 has the best performance than others. In non-human recognizing, although K=60 has the best result, the primary goal here is to detect human, so we select 89 to be the number of principle components.

 

Human

Non-human

60

96.35%

100%

89

100%

100%

111

100%

100%

 

Human

Non-human

60

93%

95.56%

89

96.3%

91.11%

111

95.81%

91.11%

 

 

 

 

We realize our human detection system in real time. The system has been implemented on a Pentium 4 2.6 GHz system under Microsoft Windows XP. The program is developed under Borland C++ Builder 6.0. We set a camera at several places in National Chiao-Tung University (NCTU) campus.

We use our program to detect human in 4 different situations: (a) Human walking in different directions, (b) Human and non-human recognition, (c) Human carrying bag and umbrella, and (d) Multiple human and non-human moving objects. If the moving object is classified as human, it is enclosed by a red rectangle. If  classified as non-human, it is enclosed by a blue rectangle. 

  a. Human walking in different direction

     We could see in the Fig 9 showed below, while two human walking in different directions, our system could detect them well. It means our system is invariant with direction.

    

       Fig 9. Human walking in different directions

 b. Human and non-human recognition

     Once both human and non-human moving objects are appearing in observed scene at the same time, our system could distinguish each well. As Fig 10 shows, ball is enclosed by a blue rectangle, and human by a red one.

    

                 Fig 10. Human and ball

 c. Human carrying bag and umbrella

     We didn't restrict that the human appearing in our observed scene can not take anything on them. In Fig 11, the left human carries a backpack and the right one takes an umbrella, our system can detect them as human.

     

     Fig 11. Human carrying backpack and umbrella

 d. Multiple human and non-human moving objects 

      If there are multiple human and non-human moving objects in our observing scene, we can also find out each of them, and furthermore, doing counting. In Fig 12, there are 3 people and 2 non-human moving objecting generating by the moving leave. Our system also separates and recognizes each of them well.

     

                  Fig 12. Multiple moving objects

Conclusion

In this thesis, we provide a new method to accomplish human detection. In static human recognition, we combine PCA and multi-layer neural network to find out human in our observed scenes. First we segment all the moving objects and normalize their size, then the image would multiplied with principle components and weight matrix generated by BP neural network in off-line training. Depending on the output value, we can judge the moving object is human or non-human. In dynamic tracking, human walking style is separated into 2 prototypes: (1) lateral, and (2) frontage. We apply with variation to be the feature in lateral and the gravity movement in specific part of human body to be the feature in frontage.

     Our human detection system possesses some specific features:

  1. The system learns from examples. The system is trained by massive training images and doesn’t need any a priori handcrafted models.

  2. We apply global feature instead of any others to accomplish human detection. The accuracy rate is better correspondingly.

  3. In dynamic tracking, we provide a new feature by which we can distinguish human and human-shaped board. Also, this feature could be used in further research.

     The pedestrian detection system can be used for many applications. In burglarproof system, our system triggers the alarm and captures the thieves’ images when thieves invade. In some environments having human number restriction, our system can count human number and warns off while it is close to the limitation.

.